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Genome Editing of Germiine Mitochondria 

PAGE 459 

Currently, there is no treatment for inherited disorders caused by mutations in mtDNA. Using 
mitochondria-targeted nucleases, Reddy et al. eliminate germline mtDNA mutations to pre- 
vent their transgenerational transmission. This strategy represents a potential therapeutic 
avenue for treating human mitochondrial disease. 

T Cells Keep Their Distance 

PAGE 486 

T cells are typically thought to kill cells from within an infected tissue. However, Guidotti et al. 
find that in viral hepatitis, cytotoxic T cells control the infection without migrating into the 
liver parenchyma. Rather, they arrest within the small blood vessels that permeate the liver 
and probe proximal hepatocytes for the presence of antigens, secreting cytokines and killing infected cells. Liver fibrosis limits 
this process, explaining why immune surveillance is compromised during chronic hepatitis. 

Tackling Replication Head On 

PAGE 513 

Eukaryotic DNA replication gets underway when two copies of the replicative helicase are loaded at an origin to initiate bidi- 
rectional replication. Using single-molecule assays, Ticau et al. show that distinct mechanisms load the two helicases and that 
interactions between the first and second helicases ensure a head-to-head architecture, setting the origin up for bidirectional 
DNA synthesis. 

NET Capture of Human Transcription 

PAGE 526 and PAGE 541 

Nojima et al. and Mayer at al. develop independent methods rooted in native elongating transcript sequencing in human cells 
to capture high-resolution snapshots of the dynamic events during transcription. The two studies provide insights into polll 
regulation and movement and the coordination between transcription and pre-mRNA splicing. Together, the two studies offer 
detailed insights into the multiple layers of transcriptional regulation. 

Cracking Compacted Chromatin 

PAGE 555 

Pioneer transcription factors access compacted chromatin and initiate cell-fate changes. Soufi et al. now discover that this 
characteristic activity, important for initiating reprogramming, relates to a TF’s ability to target partial motifs displayed on the 
nucleosome surface. For other TFs, tagging along with a pioneer factor enhances their partial motif recognition and allows 
nucleosome binding. 

Gradient on a Curve 

PAGE 569 

Intestinal villi form through mechanically induced buckling of the epithelial surface. Shyer et al. show that this change in tissue 
architecture concentrates the morphogen sonic hedgehog under the villus tip, thereby restricting stem cells to the villus base. 

Lipids Mediate Chatter across the Membrane 

PAGE 581 

Raghupathy et al. discover that transbilayer interactions mediated by long acyl chain-containing lipids are pivotal in gener- 
ating actin-dependent nanoclusters of outer membrane lipid anchored proteins. Inner-leaflet phosphatidylserine serves 
as the link between the outer-leaflet lipid and the actin cytoskeleton. These interactions 
could form the basis for generation of functional lipid domains at the plasma membrane. 



“Boyhood” for Neutralizing Antibodies 

PAGE 470 

Broadly neutralizing antibodies control HIV infection but are hard to elicit. To under- 
stand how they develop, Wu et al. followed the antibody response in a patient 
during 15 years of HIV-1 infection. They find that the potent VRC01 broadly 
neutralizing antibody lineage evolves from a single B cell. Antibody maturation occurs 
at a gradually decreasing evolutionary rate over many years, matching the rate of evolu- 
tion of the virus, in order to achieve extraordinary antibody diversity necessary for viral 
neutralization. 
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Engineering Your Own Escape 

PAGE 501 

Type VII protein secretion is critical for virulence of several medically important pathogens. 
Rosenberg et al. reveal that for translocation to occur, the virulence protein EsxB first has to 
help assemble and activate the structure that secretes it. 

The Benefit of the Burn 

PAGE 595 

Lactate is a well-known product of anaerobic metabolism, and now Lee et al. demonstrate that it 
can signal a hypoxic response that is independent of the well-characterized HIF pathway. 
Lactate binds to and stabilizes the NDRG3 protein enabling it to activate signaling that promotes 
cell growth and angiogenesis. Thus, lactate can serve a protective role under hypoxic stress. 



PAGE 610 

Neuronal swelling is the major cause of death in traumatic and ischemic brain injuries. Rungta et al. reveal that this process 
is initiated when aberrant entry of sodium ions and depolarization activates the voltage-gated chloride channel, SLC26A1 1 . 
The increase of cytoplasmic sodium and chloride causes an osmotic imbalance that leads to water entry and cytotoxic 
edema, a mechanism that could be targeted to prevent and treat brain edema. 

Coincidence Detection in Maternal Inheritance 

PAGE 634 

Stronger effects of maternal genomes than paternal ones on offspring have been attributed to maternal RNA and imprinting. 
However, in the case of mutations causing congenital eye disease, the skewed inheritance pattern, as shown by Chou et al., 
results from altered binding interactions that become combinatorially disruptive for Vitamin A delivery when it occurs both within 
the fetus and from maternal tissues supplying the placenta. The findings define a new type of physiological maternal inheritance. 

Ripple Effect of Human Mutations 

PAGE 647 

To determine the effects of disease-associated mutations on protein activities in the context of biological networks, Sahni 
et al. perform a systematic characterization of protein-chaperone, protein-protein, and protein-DNA interactions of missense 
alleles implicated in human genetic disorders. The analysis reveals surprisingly widespread and specific perturbations 
of macromolecular interactions with disease alleles. Distinct disease mutations in the same gene that give rise to different 
interaction profiles often result in distinct disease phenotypes. 

Any Sequence, Every Prctein 

PAGE 661 

Approaches for assessing what transcription factors bind to a given DNA sequence are currently limited, rendering it difficult 
to elucidate gene regulatory architecture and to understand the impacts of mutations in non-coding regions. Bass et al. move 
to fill this gap by applying yeast one-hybrid assays to human TFs and enhancers and disease-associated mutations in 
non-coding regions. The study uncovers principles of TF-enhancer interaction in disease and development, and provides 
a readily expandable resource of candidate interaction patterns. 

A TOP Method to Get Proteins into Celis 

PAGE 674 

While methods for introducing nucleic acids into target cells are well-developed, it’s more of a challenge for proteins, particularly 
when transducing primary cells. A new method (iTOP) developed by D’Astolfo et al. addresses this challenge, drawing on salt- 
mediated macropinocytosis in conjunction with a small molecule. This technique may be espe- 
cially helpful for the manipulation of cells that are otherwise difficult to transfect or for gene-edit- 
ing in primary cells. 

Finding the Nerve to Breathe 

PAGE 622 

The vagus nerve is a complex heterogenous assembly of neurons controlling many aspect 
of physiology. Chang et al. begin to get a molecular genetic handle on these neurons by 
identifying two classes of molecularly distinct sensory cells with differing lung-to-brain con- 
nectivity patterns. Optogenetic activation of one class acutely silences breathing, trapping an- 
imals in a state of exhalation, while activating the other causes rapid and shallow breathing. 





Edema Explained 
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HIV Antibodies Return to Clinical Trials 



Almost 35 years after the first cases of immunodeficiency 
associated with HIV were reported, the HIV infection has 
spread worldwide, reaching nearly 40 million people. Antire- 
troviral therapy has evolved at a fast pace, and combined 
drug therapy is now part of the standard care of HIV-infected 
individuals, successfully preserving their health and lifespan 
in most cases. The endless capacity of the virus to subvert 
the host immune response and to persist in a latent state 
nonetheless has been frustrating and challenges the expec- 
tation that a definitive cure and an effective vaccine would be 
easily achievable. 

Immunotherapy using combinations of monoclonal anti- 
bodies that are injected in infected subjects to neutralize 
the virus and stimulate immune-mediated killing of infected 
cells was a particularly promising concept. However, it failed 
to demonstrate any efficacy in initial pre-clinical and clinical 
studies in part due the ability of the virus to rapidly mutate 
and escape the antibodies. Caskey et al. (2015) now return 
to this original concept and report the initial results of a 
first-in-humans dose escalation phase 1 clinical trial using 
the 3BNC117 anti-HIV antibody in infected and uninfected 
people. The antibody infusion is safe and well-tolerated, 
and a single injection is able to reduce the viral load of in- 
fected subjects by 0.8-2. 5 logio for up to 1 month. What is 
different about the 3BNC1 1 7 antibody that gives it an advan- 
tage over the virus? 

In recent years, it has become clear that a small fraction 
of people living with HIV-1 develop a flavor of antibodies 
with a very particular feature: they are very potent at 
neutralizing different variants of the HIV (West et al., 
2014). These molecules, termed broadly neutralizing anti- 
bodies (bNAbs), usually target regions of the viral envelope 
that are conserved in viral isolates from different origins. 




Several potent anti-HIV broadly neutralizing antibodies have been 
identified to date and have the potential to be used in strategies to 
treat and prevent HIV spread in combination with other types of 
drugs. Image from iStockphoto/Eraxion. 



and therefore, they are able to bind and neutralize a large 
fraction of variants of the virus. The evolution in the 
methods to isolate and clone antibodies allowed re- 
searchers to identify several bNAbs and study their charac- 
teristics and the sites that they bind to in the virus. In 
contrast to the first generation of monoclonal antibodies 
that were ineffective as HIV-1 immunotherapy, 3BNC117 
is a very potent broadly neutralizing antibody, capable of 
neutralizing 195 out of 237 different HIV-1 strains (Scheid 
et al., 2011). Its target is the CD4-binding site on the viral 
protein gp120, the portion of the molecule that interacts 
with the CD4 receptor in the host cells. This interaction is 
crucial for the initial stages of the viral infection, and there- 
fore, it is a conserved site among different strains of the vi- 
rus and it is one of the few sites in the protein that are not 
decorated with the glycan shield that protects it against 
antibody recognition. 

Broadly neutralizing antibodies are the almost perfect tool 
against not just HIV-1 but also other highly mutagenic viruses 
such as influenza, and stimulating individuals to produce 
them with vaccines is a current goal. The problem is that 
they are very difficult to elicit under natural conditions. The 
study of their properties shows that they accumulate a large 
number of somatic mutations, deletions, and insertions that 
are infrequent in conventional antibodies (West et al., 
2014). Likely, they take many years to arise during natural 
infection, and the reasons why a few people develop 
them — but the majority of the infected humans does not— 
are unknown. Identifying bNAbs that are naturally generated 
is an important strategy to overcome this barrier, as they 
could be used to treat or promote passive protection against 
the infection. In fact, the combination of different potent 
broadly neutralizing antibodies suppresses HIV-1 viral load 
in humanized mice. Additionally, therapy with single bNAbs 
can suppress viremia in non-human primates infected with 
a simian immunodeficiency virus that is closely related to 
the human virus (West et al., 2014). 

The antibodies are only able to suppress HIV infection for 
a limited period in experimental models and in the new 
clinical trial. In humanized mice and non-human primates, 
the viremia remains suppressed as long as the concentra- 
tion of the antibody in the blood remains within therapeutic 
range. In the new clinical trial, 3BNC1 17-resistant viruses 
emerged in a fraction of the patients 28 days after the 
infusion (Caskey et al., 2015). Still, the fact that some indi- 
viduals did respond to immunotherapy, even if transiently, 
renews the hopes that this strategy is worthy of pursing. As 
with antiretroviral drugs, combination of different reagents 
may prove to be more efficacious than single-antibody 
therapy in humans. In addition, engineering bNAbs to 
improve their effector functions— for instance, antibody- 
dependent cellular cytotoxicity (Bournazos et al., 2014)— 
or to increase the affinity for their targets could also 
contribute to their clinical efficacy. Finally, antibodies are 
expensive to produce, so it is unlikely that they will 
become the first line of treatment for patients with HIV in 
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monotherapy. However, in combination with drugs that can 
activate latent viruses, interfering and potentially clearing 
the latent HIV-1 reservoir (Halper-Stromberg et al., 2014), 
they may be critical to development of the long-sought 
cure for HIV infection. 
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Wu et al. couple next-generation sequencing with structural analysis to illuminate the key 
processes that enable the natural evolution and selection of broadly neutralizing antibodies to 
HIV-1 , providing a potential roadmap for the development of HIV-1 vaccine strategies to accelerate 
the induction of protective antibodies. 



The generation of broadly reactive neutral- 
izing antibodies is the holy grail of HIV-1 
vaccine research, but no HIV-1 vaccine 
candidate has realized this goal to date. 
A substantial fraction of HIV-1 -infected in- 
dividuals is able to induce broadly neutral- 
izing antibody responses overtime (Mikell 
et al., 2011; Gray et al., 2011). Interest- 
ingly, high viral loads and chronic antigen 
exposure typically appear to contribute 
to the generation of broadly neutralizing 
antibodies (Piantadosi et al., 2009; Liao 
et al., 2013), although some broadly 
neutralizing antibodies have also been 
cloned from subjects with spontaneous 
control of viral replication. These anti- 
bodies typically have high levels of so- 
matic mutations (Burton et al., 2012; 
West et al., 2014). Although the prospect 
of designing a vaccine that can induce 
this degree of somatic hypermutation 
is daunting, understanding the natural 
evolutionary path of the development of 
these antibodies may provide important 
clues for the generation of vaccine immu- 
nogens and strategies that ultimately aim 
to recapitulate this pathway. 

In a tour-de-force study in this issue of 
Cell, Wu et al. (201 5) used next-generation 
sequencing coupled with detailed struc- 
tural determinations to reconstruct the 
evolutionary process that led to the devel- 
opment of a series of potent and broad 
neutralizing antibodies directed against 
the CD4 binding site in a single donor 
from 1995 to 2009. Evolutionary analyses 
highlight the remarkable diversity of the 
VRC01 lineage, with at least six heavy- 
chain lineages and five light-chain line- 
ages. Interestingly, these clonal families 
fell into three major clades, with up to 



25% intra-clade sequence divergence 
and up to 50% inter-family divergence. 
Each clade exhibited marked increases in 
somatic hypermutation over this period of 
time, suggestive of progressive evolution 
over 1 5 years. Remarkably, all clonal fam- 
ilies were represented at the earliest time 
points, suggesting early selection that 
continued to expand in parallel in a pro- 
gressive manner over the study period. 
Strikingly, new families reflecting the se- 
lection of novel germline B cell populations 
by the evolving virus did not emerge. 
These data collectively point to the early 
selection and progressive development 
of a finite set of naive B cell families. 

Despite dramatic sequence diversity 
among the clades, all representative anti- 
bodies from each family recognized an 
almost identical footprint on the viral en- 
velope, sharing up to 95% conservation 
in the paratope surface. However, each 
family evolved a different structural solu- 
tion to reach the unusual deeply recessed 
shape of this site of vulnerability on the 
HIV-1 envelope, illustrating that there are 
several immunologic solutions to the 
same structural antigenic problem. These 
results argue that the immune system har- 
bors a remarkable capacity to explore a 
wide landscape of solutions to neutralize 
difficult epitopes. The early selection of 
several germline B cells followed by 
continuous evolution over a substantial 
period of time may therefore be critical 
for the generation of broadly neutralizing 
antibody responses. 

It is well known that HIV-1 mutates at a 
remarkable frequency, ~1 .5 substitutions 
per 100 nucleotides per year. Interest- 
ingly, this mutation rate was surpassed 



by the evolution of the VRC01 lineage, 
which incorporated ~2 substitutions per 
100 nucleotides per year. Thus, the hu- 
moral immune response evolved more 
rapidly than the virus in this individual, 
suggesting a mechanism by which anti- 
body lineages can achieve extraordinary 
diversity in the setting of chronic HIV-1 
infection (Figure 1). The mutation rates in 
the evolution of other broadly neutralizing 
antibodies showed even higher mutation 
rates of 9 to 1 1 substitutions per 100 nu- 
cleotides per year for the V1V2-specific 
antibody CAP256 and the CD4 binding 
site-specific antibody CHI 03. It is unclear 
whether these accelerated rates of muta- 
tion are attributable to higher viral loads in 
the CAP256 and CHI 03 donors, easier to 
neutralize features of the antibody para- 
topes, peculiarities in the host back- 
ground of the donors, or simply the fact 
that these antibodies evolved within the 
first year of infection under distinct inflam- 
matory conditions. However, for all three 
antibodies, kinetic analyses of evolu- 
tionary rates suggested a trend toward 
more rapid evolution of the antibody 
response in early infection that slowed 
during later states of infection. These 
data suggest the importance of devel- 
oping vaccine strategies that drive rapid 
and persistent B cell selection at these 
levels. Defining the key triggers that 
drive accelerated somatic hypermutation, 
which would allow B cells to explore 
immunologic solutions more quickly and 
rigorously, may therefore improve the 
ability of vaccines to elicit broadly neutral- 
izing antibodies to HIV-1 . 

The concept that carefully selected Env 
immunogens may be able to guide B cell 
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Figure 1. Relative Kinetics of the Evolution of HIV-1 and the VRC01 Antibody Lineage 

The antibody lineage evolved more rapidly than the virus did in this individual, suggesting a mechanism by 
which B cells can achieve extraordinary diversity in the setting of chronic HIV-1 infection. 



development down a particular path- 
way by sequential vaccination strategies 
has gained support. The evolutionary 
complexity highlighted in this study, how- 
ever, suggests that the design or selection 
of discrete immunogens able to recapitu- 
late antigen-driven B cell selection path- 
ways will be challenging. Strategies that 
aim to deliver persistent immunogens, 
such as with the use of replicating 
vectors, are therefore being explored. 
Whether viral evolution is also required 
to drive broadly neutralizing antibody re- 
sponses remains to be determined. 
Although a burst in viral diversity has 
been linked to the rapid evolution of 
neutralizing responses in certain cases 



(Liao et al., 2013), some broadly neutral- 
izing antibodies have been isolated from 
subjects that exhibit spontaneous control 
of viral replication and therefore reduced 
viral diversity. 

Several unanswered question remain. 
One key question is whether similar de- 
grees of evolutionary complexity are 
required for the development of broadly 
neutralizing antibodies against other 
key targets, such as the V2 or V3 glycan- 
dependent epitopes, the membrane- 
proximal external region, and the gp120- 
gp41 binding interface. Another important 
question is whether triggers that drive 
accelerated somatic hypermutation can 
be defined and utilized to allow vaccine- 



elicited B cells to explore immunologic 
solutions more rapidly. Overall, the pre- 
sent studies chart the development of 
CD4 binding site antibodies in a remark- 
able level of detail, providing insights 
into the plasticity of the immune response 
and the path to the generation of broadly 
neutralizing antibodies. 
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The first event in the initiation of eukaryotic DNA replication is the recruitment of the MCM2-7 
ATPase, the core of the replicative DNA helicase, to origins. Ticau et al. use single-molecule imaging 
to reveal how ORC, Cdc6, and Cdtl cooperate to load MCM2-7 onto DNA, enabling bidirectional 
replication. 



Eukaryotic cells copy their vast ge- 
nomes by initiating DNA replication 
from thousands of origins of replication. 
To insure that replication initiates at 
every origin precisely once per cell cy- 
cle, cells divide the process of initiation 
into two stages. In G1 phase, two copies 
of the MCM2-7 ATPase are loaded onto 
origin DNA to form a “pre-replication 
complex” (pre-RC) (Figure 1). This pro- 
cess requires three “licensing” factors: 
a hexameric AAA-i- ATPase called ORC 
(origin recognition complex), another 
AAA-e ATPase called Cdc6, and Cdtl. 
When it is first loaded, MCM2-7 encir- 
cles double-stranded DNA and is inac- 
tive as a DNA helicase. In S phase, the 
two /V/CM2-7 complexes associate with 
the helicase co-factors, Cdc45 and 
GINS, forming two active CMG heli- 
cases that encircle single-stranded 
DNA, unwind the origin, and nucleate 
the assembly of two replisomes that 
travel away from the origin, copying 
DNA as they go. Importantly, once cells 
enter S phase, multiple mechanisms 
prevent de novo MCM2-7 loading onto 
origins. As a result, each origin fires 
only once per cell cycle. In this issue, 
Ticau et al. (2015) use single-molecule 
imaging to reveal how yeast MCM2-7 
double hexamers are loaded at replica- 
tion origins (Figure 1). 

Recent studies showed that ORC, 
Cdc6, Cdtl , and MCM2-7 are necessary 
and sufficient for pre-RC assembly 
(Remus et al., 2009); revealed various 
MCM2-7 loading intermediates (Fernan- 
dez-Cid et al., 2013; Sun et al., 2013; 
Sun et al., 2014); and determined how 
MCM2-7 subunits interact via their 



N termini within the so-called double hex- 
amer (Costa et al., 2014; Sun et al., 2014). 
However, the most fundamental ques- 
tion— how an origin containing a single 
ORC-binding site supports the head-to- 
head loading of two MCM2-7 mole- 
cules— remains unanswered: Are the 
two MCM2-7 hexamers loaded simulta- 
neously or one at a time? Are the two 
MCMs loaded via the same or different 
mechanisms? Does one DNA-bound 
CRC complex load both MCM2-7 hex- 
amers, or is there participation by a sec- 
ond CRC bound at a cryptic site? Cf the 
helicase-loading intermediates captured 
in recent structural studies, which com- 
plexes are on pathway? Cther questions 
refer to the exact roles of Cdc6 and 
Cdtl and the order in which they arrive 
and depart from the origin during 
licensing. 

To answer these questions, Ticau et al. 
established a single-molecule loading 
assay with recombinant yeast proteins. 
A fluorescently labeled DNA containing 
the yeast origin of replication was immo- 
bilized on a coverslip and imaged via total 
internal reflection fluorescence micro- 
scopy. Cne or two fluorescently labeled 
licensing factors (for example, MCM2-7 
and Cdc6, or MCM2-7 and Cdtl) and 
ATP were added to the flow cell, and pro- 
tein binding and unbinding on DNA was 
monitored in real time by co-localizing 
the fluorescent signals from the nucleic 
acid and the protein of interest. This assay 
determined the arrival and departure 
times of proteins relative to each other 
and identified short-lived intermediates 
not detected in ensemble or structural ap- 
proaches. Photobleaching experiments 



established the stoichiometry of bound 
factors. 

Monitoring the binding of fluorescently 
labeled MCM2-7 hexamers to DNA 
revealed that MCM2-7 is recruited one 
hexamer at a time, providing definitive 
support for previous models (Fernan- 
dez-Cid et al., 2013; Sun et al., 2013, 
2014). The authors then examined the 
relative timing of Cdc6 and Cdtl recruit- 
ment to replication origins. MCM2-7 
and Cdtl form a hetero-heptameric 
complex in solution (Kawasaki et al., 
2006) while DNA-bound CRC forms a 
complex with Cdc6 (Sun et al., 2012). 
The single-molecule approach showed 
that Cdc6 binds to CRC before 
MCM2-7«Cdt1 arrives at an origin, indi- 
cating that Cdc6 primes the origin 
recognition complex to recruit the first 
MCM2-7 hexamer (Figure 1). Moreover, 
after MCM2-7«Cdt1 binding, Cdc6 is al- 
ways released before Cdtl . Interest- 
ingly, Cdc6/Cdt1 departure times are 
significantly longer after loading of the 
second MCM2-7 ring compared to the 
first MCM2-7 ring, suggesting that the 
two hexamers are recruited differently. 
In addition, the kinetics of Cdc6/Cdt1 
departure suggests that several pro- 
cesses occur between the arrival of 
MCM2-7 and the release of Cdc6/ 
Cdtl . The number and identity of these 
steps is unclear and should be investi- 
gated in future studies. 

Finally, Ticau et al. examined CRC 
dynamics during pre-RC assembly. By 
simultaneously monitoring fluorescently 
labeled CRC and MCM2-7, they discov- 
ered that a single CRC complex remains 
bound to the origin during the arrival of 
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Figure 1. Loading Stages of the Eukaryotic 
Replicative Helicase 

During G1, the eukaryotic repiicative heiicase 
is ioaded as a preRC compiex at the repiication 
origin (ACS in yeast). Each preRC consists of two 
MCM2-7 hexamers encirciing dsDNA. Estabiish- 
ing the necessary head-to-head association of the 
hexamers invoives cooperation between the origin 
recognition compiex (ORC) and iicensing factors 
Cdc6 and Cdt1 . At the beginning of the S phase, 
the preRC is activated and associates with Cdc45 
and GiNS to form two CMG compiexes that 
travei on ssDNA away from the origin. Ticau et ai. 
dissected the stages of preRC ioading. 

both MCM2-7 hexamers, ruling out any 
models that require two ORCs to form 
a preRC (Figure 1). Importantly, ORC in- 
teracts with the C-terminal domains of 
MCM2-7 during loading (Sun et al., 
2013), yet in the final pre-RC, the two 
MCM2-7 rings interact via their N 
termini. Therefore, ORC cannot recruit 
the first and second MCM2-7 hexamers 
by the same mechanism, in agreement 
with the longer departure time of Cdc6/ 
Cdtl after the second MCM2-7 arrival. 
As MCM2-7 complexes do not interact 
in solution, the first loaded MCM2-7 
complex must adopt a conformation 
that is competent for interaction with 
the second MCM2-7 through their N 
termini. Notably, Cdc6 binding (presum- 
ably to ORC) precedes loading of the 
second MCM2-7 complex, suggesting 
that it is required for this event. It will 
be interesting to understand how Cdc6 
performs this function, given its distal 
location relative to the second MCM2-7 
loading event (Figure 1). Ticau et al. 
also revealed that ORC dissociates 
from the origin soon after loading of 
the second MCM2-7, disfavoring mech- 
anisms in which one DNA-bound ORC 
assembles several pre-RCs. 

The work by Ticau et al. is comple- 
mented by a study in Molecular Cell (Duz- 
devich et al., 2015), which explores other 
facets of pre-RC assembly, as well as 
downstream events of origin activation. 
This work shows that Cdc6 reduces the 
affinity of soluble ORC for DNA, effectively 
insuring that ORC normally binds DNA 
before Cdc6. Inducing pre-RC activation 
with yeast S-phase extract reveals that 
activation of the two MCM2-7 complexes 
in the pre-RC is temporally and thus prob- 
ably mechanistically coupled. The study 



also confirms models featuring actively 
replicating forks containing a single copy 
of the MCM2-7 ATPase. Finally, experi- 
ments by Duzdevich and colleagues sup- 
port the findings of Ticau et al. that one 
and the same ORC complex directs the 
loading of both MCM2-7 hexamers 
comprising the preRC. 

The work by Ticau et al. illustrates the 
power of simultaneously labeling pairs 
of proteins and watching them assemble 
into a multi-protein complex. The work 
provides the most definitive roadmap to 
date of the complex process underlying 
pre-RC assembly and identifies which in- 
termediates should be pursued in struc- 
tural studies. Now that origin unwinding 
and replisome assembly have also been 
reconstituted with purified components 
(Yeeles et al., 2015), we can expect the 
full power of single-molecule analysis to 
be applied to understanding the dynamics 
of these processes. 
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in animal embryos, morphogen gradients determine tissue patterning and morphogenesis. Shyer 
et ai. provide evidence that, during vertebrate gut formation, tissue foiding generates graded activ- 
ity of signals required for subsequent steps of gut growth and differentiation, thereby reveaiing an 
intriguing iink between tissue morphogenesis and morphogen gradient formation. 



The graded distribution of morphogens 
plays a fundamental role in many devel- 
opmental and disease-related processes. 
Such morphogen gradients control cell 
differentiation in a concentration-depen- 
dent manner and thus provide positional 
information about the distance from 
the morphogen source (Wolpert, 1969; 
Figure 1 A). In the neural tube, for example, 
the graded distribution of the signaling 
molecule Sonic hedgehog (Shh) triggers 
the specification of different neuronal 
subtypes along the dorsal-ventral axis 
(Dessaud et al., 2007). The molecular 
and cellular mechanisms leading to the 
formation of morphogen gradients have 
been analyzed in detail, and several 
models have emerged explaining gradient 
formation on the basis of signal produc- 
tion, spreading, and degradation (Kicheva 
et al., 2012). However, gradient formation 
has nearly exclusively been analyzed in 
effectively planar two-dimensional cell 
layers, where the signals spread within 
the plane of the tissue. Interestingly, 
recent work suggests that signaling within 
the zebrafish lateral line primordium can 
be spatially constrained by the formation 
of microluminal structures (Durdu et al., 
2014), pointing at the importance of 
incorporating three-dimensional tissue 
morphogenesis in generating graded sig- 
naling activities. In this issue of Cell, Shyer 
et al. (2015) present evidence that three- 
dimensional rearrangements of tissues 
can generate gradients of signaling mole- 
cules in the surrounding tissues. These 
results provide important insight into the 
coupling of tissue morphogenesis and 
gradient formation with consequences 
for cell fate specification and tissue 
patterning. 

The lumen of the gut in chick undergoes 
a series of morphogenetic processes 



transforming the initially smooth lumen 
lining into a surface densely decorated 
with individual villi, required for effective 
absorption of nutrients within the gut 
(Coulombre and Coulombre, 1958). This 
transformation is thought to be triggered 
by growth of the lumen surface coupled 
to compressive forces from surrounding 
tissues restricting the expansion of the 
proliferating tissue and thus causing the 
lumen surface to buckle. The transforma- 
tion of buckles into villi critically depends 
not only on general growth under spatial 
confinement but also on a drop in pro- 
liferation at the tip of the folds and redistri- 
bution of stem cells to the base of the 
forming villi. The study by Shyer et al. 
(201 5) addresses the mechanism underly- 
ing this redistribution of stem cells, which 
are initially uniformly distributed in the 
early gut. 

Confirming previous work (Karlsson 
et al., 2000), the authors show that, in 
the distal mesenchyme of the nascent villi, 
a “villus cluster” forms. The cells of this 
cluster express several signaling factors 
inhibiting stem cell specification and pro- 
liferation in the overlying distal epithelium 
of the forming villi. This raises the question 
as to the molecular and cellular mecha- 
nisms by which the villus cluster is formed 
at the villi tip. The Shh signaling pathway 
has previously been implicated in the 
formation of the villus cluster. Thus, the 
authors hypothesized that local Shh 
signaling at the villi tip might induce the 
villus cluster. However, as the authors 
had previously shown that shh mRNA is 
uniformly distributed throughout the gut 
endoderm, other mechanisms than re- 
stricting shh expression to tip cells of the 
forming villi had to be tested. 

In a set of elegant experiments, inspired 
by predictions from theoretical modeling. 



the authors show that the formation of 
villi would generate local maxima of Shh 
signaling activity at the villi tips respon- 
sible for the induction of the villus cluster 
below. To this end, the authors assumed 
that Shh is secreted equally by all endo- 
dermal cells, diffuses within the under- 
lying mesenchyme, and is degraded. 
Crucially, the morphological changes of 
the forming villus are captured by chang- 
ing boundary conditions, which lead to a 
steady-state concentration profile with 
maximum concentration at the tip of the 
villus; this maximum concentration in- 
creases as the villus grows more acute 
(Figure IB). If the induction of the villus 
cluster requires high Shh concentrations, 
this scenario would explain its localization 
to the tip. To test this scenario directly, the 
authors undertook explant experiments in 
which they either prevent buckling by flip- 
ping the epithelium inside out or induce 
premature folding by placing slabs of em- 
bryonic gut on fine grids forcing the sur- 
face to bend. These experiments clearly 
show that preventing gut buckling abol- 
ishes the localized induction of villus clus- 
ters, whereas forcing premature buckling 
induces premature villus clusters. The 
key role of Shh in this process was further 
supported by experiments showing that 
Shh protein displays a graded distribution 
with maxima at the villi tips and that 
modulating Shh signaling activity affects 
villus cluster formation. Collectively, these 
data provide strong support for an in- 
structive function of surface buckling in 
establishing local maxima of Shh sig- 
naling activity responsible for villus cluster 
formation. 

Several questions arise from this work. 
Foremost, we still know very little about 
how the Shh gradient forms: is Shh pro- 
duction/secretion homogenous? Does 
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Figure 1. Forming a Morphogen Gradient by Tissue Folding 

(A) Schematic of morphogen gradient modei: morphogens are secreted by source ceiis and form a graded concentration profiie in the target tissue, where ceiis 
express different target genes (X, Y, and Z) and uitimateiy adopt different ceii fates dependent on the morphogen concentration. 

(B) Tissue foiding ieads to viiii formation in the gut. Shh moiecuies are shown in biue. As the viiii grow more acute, the maximai Shh concentration at the tip 
increases; the Shh concentration uitimateiy exceeds a high threshoid (dotted iines) above which the formation of the viiius ciuster in the underiying mesenchyme is 
induced. 



Shh simply diffuse in the extracellular 
space? How does it get degraded? 
Although the model predicts the genera- 
tion of local maxima of Shh signaling ac- 
tivity upon gut folding, there are several 
signaling-related processes beyond the 
geometrical change of the tissue that 
might be affected by the folding process 
itself. For instance, signal secretion from 
the gut epithelium to the villus cluster 
might be modified by changes in the 
apical-to-basal proportion of gut epithelial 
cells due to cell shape changes during the 
buckling process. Moreover, Shh signal 
propagation and degradation within the 
villus cluster mesenchyme might be 
modulated by cellular rearrangements 
within the cluster as a result of cluster 
shape changes during villi formation. 
Finally, reciprocal BMP signaling activity 
induced within the villus cluster by Shh 
signaling from the gut epithelium and 
required for restricting the proliferative 
activity within the forming villi might itself 
be altered by cluster shape changes dur- 
ing the folding process. Experimentally 



determining potential changes in such 
processes during villi formation and incor- 
porating them as parameters into a theo- 
retical model of villi formation as a func- 
tion of Shh and BMP signaling will likely 
generate intriguing predictions about the 
behavior of this system, which, in turn, 
can be tested experimentally. 

Another issue, related to the points dis- 
cussed above, is the precise spatiotem- 
poral relationship between Shh and BMP 
signaling activity and tissue morphogen- 
esis. As observed for other feedback 
mechanisms (Brandman and Meyer, 
2008), the time delays between Shh/ 
BMP signaling and the different morpho- 
genetic processes leading to villi forma- 
tion (tissue folding and cell proliferation) 
will be critical for the outcome of the pro- 
cess. It will be interesting to determine 
how quickly cells within the mesenchyme 
upon reception of Shh signals from the villi 
tip can upregulate BMP expression and 
how quickly BMP receiving cells within 
the gut epithelium can switch off the pro- 
liferative activity. Again, experimentally 



addressing such delays and incorporating 
them as parameters in theoretical model 
will likely produce informative predictions 
about the process itself. 
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The existence, nature, and role of highly ordered membrane domains, often referred to as lipid rafts, 
have been highly debated by cell biologists for many years. In this issue, Raghupathy et al. describe 
molecular mechanisms leading to the formation of ordered lipid-protein clusters. 



The lipid raft concept as a membrane 
organizing principle that modulates 
cellular functionality has been controver- 
sial ever since it was first proposed (Si- 
mons and Ikonen, 1997). The question 
was, do lipid rafts really exist, and if so, 
what is their exact composition, size, and 
lifetime? It has become the general notion 
that lipid rafts are at most transient molec- 
ular assemblies that might bring together 
different molecules on small spatial 
scales, leading to brief local increases in 
molecular order and compartmentaliza- 
tion that could influence cellular events, 
including signaling (Lingwood and Si- 
mons, 2010). But what about other sour- 
ces of membrane microheterogeneity? 

In a technical tour de force presented in 
this issue of Cell, Raghupathy et al. (201 5) 
use experiments and simulations to 
systematically analyze the molecular 
mechanisms underlying the formation of 
membrane assemblies comprising glyco- 
sylphosphatidylinositol (GPI)-anchored 
proteins (GPI-APs). Synthetic fluorescent 
GPI-AP analogs, which the authors incor- 
porate into the outer plasma-membrane 
leaflet of Chinese hamster ovary cells, 
exhibit nanoclustering on <1 00 nm scales, 
as indicated by a decrease in fluorescence 
anisotropy due to Forster resonance en- 
ergy transfer effects (G os warn i et al., 
2008). Surprisingly, nanocluster formation 
is remarkably dependent on the length of 
the acyl chain forming the GPI anchor: 
clustering is only observed for GPI-APs 
with long saturated acyl chains contain- 
ing > 18 carbon atoms, suggesting an 
interdigitation-based mechanism. Nano- 
clustering diminishes upon cholesterol 
depletion in actin-depleted cell blebs and 
for mutant cell lines deficient in the inner- 
leaflet lipid phosphatidylserine (PS). In the 
PS-depleted cells, only the addition of PS 
with at least one long saturated chain 



restored nanoclustering. Intriguingly, the 
effect is also enhanced upon expression 
of proteins specifically linking PS to the 
actin cytoskeleton— that is, protein do- 
mains capable of binding PS and able to 
mediate the interaction of the lipids with 
cytoplasmic actin filaments. Atomistic mo- 
lecular-dynamic simulations confirm the 
observations with respect to the choles- 
terol-assisted inner-leaflet coupling of im- 
mobilized PS and GPI-APs, both of which 
again require long saturated acyl chains. 
The simulations also reveal an apparently 
high degree of molecular order in the nano- 
clusters. Finally, like the GPI-APs, a long 
acyl chain containing fluorescent phos- 
phoethanolamine lipid analog exhibits 
PS- and cholesterol-dependent nanoclus- 
tering in the outer leaflet. 

The experiments of Raghupathy et al. 
(2015) show that nanoclusters form by 
transbilayer coupling only in the presence 
of long unsaturated acyl chains, choles- 
terol, and immobilization of one of the 
partners (Figure 1). In the proposed 
model, it is this immobilization, usually 



by cortical actin, that determines where 
and when the clusters will be stabilized. 
Thus, it is actin dynamics that control 
domain formation at the outer leaflet of 
the cell membrane. Taken individually, ca- 
veats could perhaps be identified for 
some of the experimental approaches 
used by Raghupathy et al. (2015). For 
example, it is not known if cell blebs, 
where nanoclustering is not observed, 
are truly actin free. However, the sum of 
the experimental and theoretical studies 
amounts to a convincing argument for a 
new mechanism of nanocluster formation. 
Interestingly, the experiments and simula- 
tions suggest that transbilayer coupling 
can work both ways. That is, when the 
GPI-APs are clustered and immobilized 
extracellularly, PS lipids form correlated 
patches intracellularly. Such effects might 
provide a mechanism for relaying signals 
from the extra- to the intracellular space 
of the cell. 

Although the actin-dependent organiza- 
tion of GPI-APs into clusters was estab- 
lished some time ago (Goswami et al.. 



Ordered Disordered Homo-FRET Interleaflet coupling 




Figure 1. Ordered Membrane Domain Formation 

Raghupathy et al. (2015) provide evidence for inter-leaflet coupling as a mechanism of membrane 
nanoclustering of lipid-anchored molecules such as GPI-APs but only in the presence of long unsaturated 
acyl chain anchored molecules on both sides, cholesterol, and more importantly, immobilization of one of 
the partners— for example, by the cortical actin cytoskeleton. 
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2008), with the underlying mechanism now 
teased apart (Raghupathy et al., 2015), the 
question remains of how these instances 
of nanoclusters fit into the more general 
“lipid raft” concept. Specifically, what is 
the relation of these clusters to the many 
different types of molecular assemblies at 
the membrane that have all been defined 
as lipid rafts, including GPI-AP clusters? 
The generality of traditional raft models 
based on detergent-resistant membrane 
patches (Simons and Ikonen, 1997) or 
phase-separated model membranes of 
ternary lipid mixtures has been under- 
mined by multiple experiments (e.g., Hon- 
igmann et al., 2014a). Yet, the principle of 
phase separation as a structural and func- 
tional principle cannot be fully set aside. 
For example, giant plasma membrane ves- 
icles derived from living cells, which 
contain the cellular membrane proteins 
and lipids but lack cytoskeleton and are 
incapable of energy-dependent pro- 
cesses, have the potential to phase sepa- 
rate as well, with proteins and lipids 
showing a distinct preference for one of 
the phases, albeit at unphysiological tem- 
peratures (Sezgin et al., 2012). It seems 
likely that the sources of cell membrane 
heterogeneity are themselves heteroge- 
neous and that not all of the structures 
form via the same mechanism. For 
instance, although specific lipids, choles- 
terol, and/or the cortical cytoskeleton 
regulate some protein assemblies and 
membrane-assisted signaling events, 
others appear to be completely indepen- 
dent of these factors. Moreover, molecules 
observed to behave similarly in one exper- 



imental context appear to have different 
characteristics when observed using other 
approaches and conditions (e.g., different 
cells or expression levels). 

To avoid confusion, it seems very 
important to defer from generalizing from 
single experimental or theoretical obser- 
vations and specifically to avoid the 
temptation to refer to all types of mem- 
brane assemblies as lipid rafts. As 
the authors themselves stress, the new 
work addresses only the formation of clus- 
ters of long-acyl-chain-containing lipid- 
anchored proteins, which may not as yet 
exemplify a general organizing principle. 
However, it is possible that processes 
similar to those depicted by Raghupathy 
et al. (2015) have an important role in the 
formation of other membrane assemblies. 
For instance, in several cases, the cortical 
cytoskeleton (Honigmann et al., 2014b; 
Kusumi et al., 2010) and inter-leaflet 
coupling (Spillane et al., 2014) have, 
among other factors such as membrane 
curvature (Larsen et al., 2015), been 
shown to drive the organization of mem- 
brane molecules. It will be of great interest 
to determine the extent to which the 
pinning of inner-leaflet lipids to cortical 
actin in combination with inter-leaflet 
coupling involving specific lipids and 
cholesterol drives the transient assembly 
of other types of membrane molecules, 
if at all. An important related question 
concerns whether the lifetimes of the 
observed structures are sufficient to influ- 
ence membrane protein function. At the 
very least, the new work sets the technical 
standard for these types of inquiries. 
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Chou et al. discover a new mode of maternal inheritance by analyzing human mutations in plasma 
retinol binding protein (RBP). Mechanistically, these mutations simultaneously lower RBP’s affinity 
for vitamin A and greatly increase its affinity for its cell-surface receptor, thus dominantly blocking 
the transmembrane transport of vitamin A. 



Vitamin A has been the light sensor for 
vision, or its equivalent, from life’s begin- 
nings. The diversification of its function 
to regulating cell growth and differentia- 
tion, which occurred about 500 million 
years ago, coincided with the emer- 
gence of a long-range and specific 
vitamin A transport system consisting 
of a blood carrier protein called plasma 
retinol binding protein (RBP) and its 
cell-surface receptor (STRA6), which 
mediates vitamin A uptake (Kawaguchi 
et al., 2007). Many organs depend on 
vitamin A action; however, the eye is still 
the organ most sensitive to vitamin A 
deficiency, the loss of RBP, or the loss 
of STRA6 (Zhong et al., 2012). Studies 
of human mutations leading to con- 
genital eye malformation allowed Tom 
Glaser and colleagues, in this issue of 
Cell, to uncover a new mode of maternal 
inheritance and its intricate mechanism 
involving RBP, STRA6, and vitamin A 
(Chou et al., 2015). 

Glaser’s team identified human RBP 
mutations that cause eye malformation 
such as anophthalmia, microphthalmia, 
and coloboma, but the phenotypes are 
preferentially transmitted if the muta- 
tions are inherited from the mother. 
Through a series of elegant biochemical 
analyses, they discovered that these 
mutant RBPs block vitamin A transport 
by STRA6 in a precise and dominant 
manner. To understand this mechanism, 
it is useful to understand the major 
players involved in specific vitamin A 
transport. Under physiological condi- 
tions, vitamin A/RBP complex (termed 
holo-RBP) associates with transthyretin 
(TTR) to increase its molecular weight 
and thus prevent removal by kidney 
filtration. Holo-RBP dissociates from the 



complex to bind to its cell-surface recep- 
tor STRA6 due to its higher affinity for 
STRA6 than for TTR (Kawaguchi et al., 
2011). STRA6 then catalyzes vitamin A 
release from holo-RBP and its transport 
into the cell where it is stored (Kawagu- 
chi et al., 2011). After losing its cargo, 
RBP (apo-RBP) has greatly decreased 
affinity for TTR and is lost through kidney 
filtration, which prevents its accumula- 
tion in the blood (Figure 1). RBP muta- 
tions identified in this study confer 
several properties that are important for 
their pathogenicity and the way they 
affect this process. First, while these 
mutant RBPs are as well secreted as 
the wild-type protein, they bind vitamin 
A weakly and tend to lose their cargo in 
a receptor-independent manner. Sec- 
ond, like wild-type holo-RBP, the mutant 
ones can still bind to TTR; however, they 
have an affinity for STRA6 that is 30- to 
40-fold higher than that of wild-type 
RBP (Figure 1). These properties lead to 
the sequestration of STRA6 by the 
mutant RBP, decreasing its ability to 
transport vitamin A into the cell. Indeed, 
although the blood of mutation carriers 
may contain about 66% wild-type RBP 
and only 33% mutant RBP, the blocking 
effect is sufficient to cause the malforma- 
tion phenotypes. 

A longstanding puzzle in the field was 
the discrepancy between the way human 
mutations in RBP and STRA6 affect eye 
development. While mutations in STRA6 
cause anopththalmia and other develop- 
mental defects (Pasutto et al., 2007), all 
previously identified RBP mutations did 
not. As this study shows, a key to this 
puzzle is the placenta’s role in vitamin A 
delivery. Previously identified mutations 
in RBP were recessive and thus unlikely 



to inactivate both maternal and fetal 
RBP because only the fetus could be 
homozygous for the mutation. In contrast, 
dominant RBP mutations, like those 
discovered in this study, can simulta- 
neously inactivate the mother’s and fetal 
RBP. Similarly, human mutations in RBP 
and its receptor do not necessarily 
generate the same phenotypes for the 
same reason. A human embryo without 
STRA6 can be directly impacted by the 
loss of placental absorption of vitamin A 
from maternal RBP. In contrast, a human 
embryo without RBP still has STRA6 
and functional maternal RBP to ensure 
vitamin A delivery to the embryo through 
the placenta. Thus, the dominant human 
RBP mutations mimic STRA6 mutations 
in their ability to inactivate both maternal 
and fetal RBP-mediated transport and 
therefore can cause anophthalmia. The 
fact that the dominant RBP mutation can 
only exert its effect on maternal delivery 
of vitamin A to the fetus if it is maternal 
explains this new mode of maternal 
inheritance. 

A related question highlighted by this 
study is the contrast between humans 
and mice in phenotypic variability even 
for null mutations in this pathway. For 
example, human pathologies caused by 
STRA6 mutations are highly variable, 
ranging from the “milder” phenotype 
of anophthalmia to more systemic devel- 
opmental defects (Pasutto et al., 2007). 
However, under standard laboratory 
conditions, STRA6 knockout mice, like 
RBP knockout mice, have vision-specific 
phenotypes that lead to blindness due to 
lack of vitamin A (Amengual et al., 2014; 
Ruiz et al., 2012). As pointed out here, a 
key to this puzzle is an RBP-independent 
pathway mediated by vitamin A esters 
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Figure 1 . Schematic Diagrams of RBP/Receptor Interaction 

Schematic diagrams comparing RBP/receptor interaction under physiological conditions and the pathological conditions revealed by Chou et al. (2015). Under 
physiological conditions (left diagram), holo-RBP dissociates from the RBP/TTR complex to bind to the RBP receptor STRA6 due to its higher affinity for STRA6 
than for TTR. STRA6 catalyzes retinol release from holo-RBP and retinol transport into the cell to be stored by binding to CRBP, or by conversion to retinyl esters 
by LRAT (not shown). After losing its cargo retinol, RBP (apo-RBP) can no longer bind to TTR and is lost through kidney filtration. As depicted in the right diagram, 
human RBP mutant A55T or A57T (marked in red in RBP structures) tends to lose vitamin Aina receptor-independent manner, but has much higher affinity for 
STRA6 than wild-type RBP (the mutant RBP/receptor complex is marked by a red circle). These RBP mutants compete with wild-type RBP in binding to STRA6 to 
impede the transmembrane transport of vitamin A. 



(retinyl esters) borrowing the lipopro- 
tein delivery pathway. This pathway can 
partly compensate for the lack of RBP 
if sufficient and constant vitamin A is 
available (Quadro et al., 2005). Retinyl 
ester bound to lipoprotein depends on 
immediate vitamin A intake from food 
because its source is the small intestine. 
Unlike the RBP system, the retinyl ester 
pathway is not homeostatically regu- 
lated, cannot mobilize the vast amount 
of vitamin A stored in the liver, and is 
directly influenced by fluctuating vitamin 
A intake (Green and Green, 1994). There- 
fore, pathological phenotype becomes 
highly variable and ranges from eye-spe- 
cific pathology to embryonic lethality 
when only this pathway is depended on 
for vitamin A delivery in mice (Quadro 
et al., 2005) or in humans (Pasutto 
et al., 2007). Another facet of this RBP- 
independent pathway is that it is associ- 
ated with toxicity due to its lack of ho- 



meostatic control and specificity (Smith 
and Goodman, 1976). An increase of 
10% in retinyl ester in the blood is re- 
garded as a sign of vitamin A overload 
in humans. 

This study also illustrates how RPB 
and its receptor have been finely tuned 
for each other in evolution. Tighter bind- 
ing is not necessarily better and can 
even be pathogenic. Each RBP protein 
transports only one vitamin A molecule 
and STRA6 only takes up one vitamin A 
molecule at a time from RBP. Therefore, 
the STRA6/RBP interaction must be suf- 
ficiently strong and specific to achieve 
vitamin A uptake, but not too strong or 
prolonged to prevent RBP from deliv- 
ering the next vitamin A molecule. The 
unexpected existence of super-binding 
RBP mutants in humans revealed a 
new mode of inheritance depending 
on the genetic buildup of both mother 
and fetus. 
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Cryo-electron microscopy (cryo-EM) of single-particle specimens is used to determine the struc- 
ture of proteins and macromolecular complexes without the need for crystals. Recent advances 
in detector technology and software algorithms now allow images of unprecedented quality to 
be recorded and structures to be determined at near-atomic resolution. However, compared with 
X-ray crystallography, cryo-EM is a young technique with distinct challenges. This primer explains 
the different steps and considerations involved in structure determination by single-particle cryo- 
EM to provide an overview for scientists wishing to understand more about this technique and 
the interpretation of data obtained with it, as well as a starting guide for new practitioners. 



Introduction 

Cryo-electron microscopy (cryo-EM) has the ability to provide 
3D structural information of biological molecules and assem- 
blies by imaging non-crystalline specimens (single particles). 
Although the development of the cryo-EM technique began in 
the 1970s, in the last decade the achievement of near-atomic 
resolution (<4 A) has attracted wide attention to the approach. 

The remarkable progress in single-particle cryo-EM in the last 
2 years has primarily been enabled by the development of direct 
electron detector device (DDD) cameras (Faruqi and McMullan, 
2011; Li et al., 2013a; Milazzo et al., 2011). DDD cameras have 
a superior detective quantum efficiency (DQE), a measure of 
the combined effects of the signal and noise performance of 
an imaging system (McMullan et al., 2009), and the underlying 
complementary metal-oxide semiconductor (CMOS) technology 
makes it possible to collect dose-fractionated image stacks, 
referred to as movies, that allow computational correction of 
specimen movements (Bai et al., 2013; Campbell et al., 2012; 
Li et al., 2013a). Together, these features produce images of 
unprecedented quality, which, in turn, improves the results of 
digital image processing. In parallel, the continually increasing 
computer power allows the use of increasingly sophisticated 
image processing algorithms, resulting in greatly improved and 
more reliable 3D density maps (see also Cheng, 201 5, this issue). 

Much effort has been invested in simplifying and automating 
the collection of EM images and the use of image processing 
software (reviewed in Lyumkis et al., 2010). The problematic 
issue with single-particle EM, however, is that there is still no 
objective quality criterion that is simple and easy to use, such 
as the R-free value in X-ray crystallography, that would allow 
one to assess whether the determined density map is accurate 
or not. Even the resolution of a density map remains subject to 
controversies. The remaining unresolved issues may not always 



be fully appreciated by new practitioners and, if overlooked, can 
lead to questionable results. A recent example is the 6-A-resolu- 
tion structure of the HIV-1 envelope glycoprotein (Mao et al., 
2013), which prompted a number of commentaries questioning 
the validity of the structure (Henderson, 2013; Subramaniam, 
2013; van Heel, 2013). This primer seeks to inform about the 
practical nuts and bolts behind determining a structure by sin- 
gle-particle cryo-EM and to guide new practitioners through 
the workflow (Figure 1) and important caveats and consider- 
ations. Also, as these authors’ opinions may not always be 
shared by everybody in the field, the reader is encouraged to 
consult other texts on single-particle EM, such as Bai et al, 
(2015), Frank (2006), Lau and Rubinstein (2013), Milne et al. 
(2013), and Orlova and Saibil (2011). 

Protein Purification for Single-Particle Cryo-EM 

Single-particle EM depends on the computational averaging of 
thousands of images of identical particles. If particles exhibit 
variable conformation or composition (heterogeneity), more ho- 
mogeneous subsets can be generated using classification pro- 
cedures (more below). However, whenever possible, structural 
heterogeneity should be minimized through biochemical means 
to simplify structure determination. Biochemical analyses by 
SDS-PAGE and gel-filtration chromatography are not sufficient 
to assess whether a sample is suitable for EM analysis, as appar- 
ently intact complexes can be a mixture of compositionally 
different sub-complexes, and even compositionally homoge- 
neous complexes can potentially adopt many different confor- 
mations. The most informative way to judge the quality of a 
protein sample is to visualize it by negative-stain EM. In addition 
to providing high contrast, the negative staining procedure also 
tends to induce proteins to adsorb to the carbon film in one 
or only few preferred orientations, making it easier to assess 
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Figure 1. The Steps Involved in Structure Determination by Single- 
Particle Cryo-EM 

A single-particle project should start with a characterization of the specimen 
in negative stain (left arm of the workflow). Only once the EM images, or 
potentially 2D class averages, are satisfactory, i.e., the particles are mono- 
disperse and show little aggregation and a manageable degree of heteroge- 
neity (“low-resolution” sample refinement), is the sample ready for analysis by 
cryo-EM (right arm of the workflow). The images, 2D class averages and 3D 
maps obtained with vitrified specimens may indicate that the sample requires 
further improvement to reach near-atomic resolution (“high-resolution” sam- 
ple refinement). 

sample homogeneity (Ohi et al., 2004). The kind of information 
negative-stain EM provides is described in Supplemental Infor- 
mation. 

Structural heterogeneity can be caused by compositional or 
conformational variability of the target. Compositional heteroge- 
neity, typically the result of sub-stoichiometric components or 
dissociation of loosely associated subunits, can be addressed 
in various ways. Ideally, buffer conditions can be found that 
stabilize the target complex. A promising approach to identify 
suitable buffer conditions is the Thermofluor-based screening 
approach (Ericsson et al., 2006). In the case of a sub-stoichio- 
metric subunit, this subunit can be tagged for affinity purification, 
thus increasing the fraction of complexes containing it in the final 
preparation. An approach that has proven useful in reducing 
compositional heterogeneity is mild chemical cross-linking with 
glutaraldehyde. More control over the cross-linking reaction is 
obtained with the GraFix technique, in which the sample is 



centrifuged into a combined glycerol/glutaraldehyde gradient 
(Kastner et al., 2008). A variation of this approach is “on column” 
cross-linking, in which the sample is cross-linked over a size- 
exclusion column (Shukla et al., 2014). Whichever approach is 
used, one must keep in mind that cross-linking can introduce ar- 
tifacts. For example, flexible extensions can become glued 
together, resulting in a non-physiological structure. Also, if a 
complex can adopt different conformations, cross-linking can 
stabilize just one particular state, typically the most compact or- 
ganization (e.g., Shukla et al., 2014). Hence, native sample al- 
ways has to be analyzed, too, to understand how cross-linking 
affects the structure of the target. 

Conformational heterogeneity tends to be more difficult to 
overcome, especially if one or several domains are flexibly teth- 
ered to the remainder of a protein. In this case, structural analysis 
may be restricted to negative-stain EM studies. Alternatively, 
chemical cross-linking can potentially be used to minimize the 
conformational heterogeneity, but the physiological relevance 
of the resulting structures will have to be carefully assessed. 
Another way to reduce conformational heterogeneity is to lock 
the target in a defined functional state, which can sometimes 
be accomplished by adding substrates, inhibitors, ligands, co- 
factors, or any other molecule affecting the function of the target. 

The greatly improved image quality provided by DDD cameras 
and the availability of ever more sophisticated image-processing 
software have made structural heterogeneity more manageable. 
Still, investing time to minimize structural heterogeneity by 
biochemical tools will always simplify subsequent image pro- 
cessing steps, and it will substantially reduce the risk of obtain- 
ing incorrect density maps. Every new project should thus 
always start with an optimization phase, in which negative-stain 
EM is used as a tool to optimize protein purification (Figure 1). In 
rare cases, negative staining will introduce artificial heterogene- 
ity. The only option to exclude this possibility is to look at vitrified 
specimens by cryo-EM. 

Specimen Preparation for Single-Particle Cryo-EM 

Before a biological specimen can be imaged, it has to be pre- 
pared so it survives the vacuum of the electron microscope, 
which causes sample dehydration, and the exposure to elec- 
trons, which results in radiation damage (the deposition of en- 
ergy on the specimen by inelastic scattering events that causes 
breakage of chemical bonds and ultimately structural collapse). 
The most commonly used preparation techniques, negative 
staining and vitrification, are briefly discussed in Supplemental 
Information. 

Specimens used for single-particle EM usually consist of puri- 
fied sample on a carbon film with a support structure. The sup- 
port structure is most commonly a copper grid, and the carbon 
film can either be a continuous film, typically used to prepare 
negatively stained samples, or a holey film, commonly used to 
prepare vitrified specimens. A problem with EM grids is that 
thin carbon films are not very stable and are poor conductors 
at low temperature. This is thought to contribute to the occur- 
rence of beam-induced movement, which can degrade image 
quality. Therefore, different grid designs have been explored to 
increase the conductivity of EM grids, such as using doped sili- 
con carbide as the substrate (Cryomesh; Yoshioka et al., 2010), 



Cell 161 , April 23, 2015 ©2015 Elsevier Inc. 439 





Cell 




Figure 2. Single-Particle Cryo-EM Images 
with Motion Correction 

Most data recorded with DDD cameras are dose- 
fractionated image stacks (movies) that can be 
motion-corrected. 

(A) A typicai cryo-EM image of vitrified archaeai 
20S proteasome particies embedded in a thin iayer 
of vitreous ice. The image is the sum of the raw 
movie frames without motion correction. 

(B) Trace of motion of aii movie frames determined 
using a whoie-frame motion-correction aigorithm 
(Li et ai., 2013a). Note that the movement between 
frames is iarge at the beginning but then siows 
down. 

(C) Left: the power spectrum caicuiated from 
the sum of the raw movie frames without 
motion correction. Right: the power spectrum 
caicuiated from the sum of movie frames after 
motion correction. Motion^ correction restores 
Then rings to ciose to 3-A resoiution (dashed 
circie). 

(D) Sum of the movie frames that were shifted 
according to the shifts shown in (B). Note that the 
images shown in (A) and (D) are indistinguishable 
by eye, but differ significantly in the quality of the 
Then rings seen in their power spectra (C). 



and to minimize defocus spread due to 
different heights of the molecules in the 
ice layer, which can hamper high-resolu- 
tion structure determination. Importantly, 
if particles cannot be seen reasonably 
easily by eye, the sample should not be 
used for data collection. Parameters that 



and to make them more mechanically stable, such as using gold 
support (Russo and Passmore, 2014). Before the specimen 
can be applied, grids have to be rendered hydrophilic, which is 
typically done with a glow discharger (or, less commonly, with 
a plasma cleaner). 

A perfect vitrified specimen is characterized by an amorphous 
ice layer of sufficient thickness to accommodate the particles 
(but ideally not much thicker so that particles are clearly visible), 
and particles that are well distributed across the field of view and 
adopt a wide range of orientations. A thin layer of vitrified ice is 
reasonably transparent and allows particles to be seen clearly 
(Figures 2 and SI A), while crystalline ice adds a strong texture 
of dark contrast (bend contours) that usually disguises the 
embedded particles (Figure SIB). 

Semi-automated plungers, such as Vitrobot (FBI) and Cryo- 
plunge (Gatan), have made it much easier to reproducibly obtain 
high-quality vitrified specimens. However, care has to be taken 
to transfer the grids quickly between plunger and cryo-specimen 
holder and to minimize exposing the liquid nitrogen to air to avoid 
ice contamination (Figure SI C). An occasional problem is ice that 
has the appearance of “leopard skin” (Figure SID). It is unclear 
what causes this pattern and how it can be avoided, but particles 
picked from images of such ice areas can still yield reliable 3D 
maps. 

The ice layer should be as thin as possible to achieve high 
contrast between the molecule and the surrounding ice layer 



affect ice thickness are described in Sup- 
plemental Information. The ice layer usually tends to be thicker 
around the edge of a hole and thinner in the center. Large mole- 
cules, such as viruses and ribosomes, may thus be excluded 
from the center of a hole. This effect is stronger with specimens 
containing detergent, which lowers the surface tension, making 
it also more challenging to produce thin ice. If thin ice is desired, 
it helps to use holey carbon grids with smaller holes. 

A good vitrified specimen shows a high density of molecules in 
different orientations. Many particles in a hole reduces the num- 
ber of images that have to be collected, but ideally the molecules 
should not touch each other. A problem that is often encountered 
is that only very few molecules are observed in the holes of the 
carbon film. A large percentage of molecules is removed during 
blotting with filter paper, and preparation of vitrified specimens 
thus requires a much higher sample concentration than prepara- 
tion of negatively stained specimens. It is not unusual, however, 
that even with very highly concentrated samples, few particles 
are seen in the holes. Reasons for this problem can be that the 
molecules preferentially adsorb to the carbon film, diluting 
them from the holes, or that they denature as they come into con- 
tact with the air/water interface due to the surface tension. An 
effective solution to deal with the preferential adsorption to the 
carbon film is to apply the sample twice. The first application 
will saturate the carbon film with protein, and it is therefore 
more likely that more particles remain in the holes when the 
sample is applied a second time. Alternatively, the grid can be 
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covered by a thin carbon or graphene support film or by a lipid 
monolayer to which the molecules can adsorb. However, with 
the exception of graphene, additional support films will reduce 
image contrast, and all substrates have the potential to induce 
molecules to adopt preferred orientations. Finally, the grid can 
be decorated with a self-assembled monolayer to pacify the 
support film and drive the molecules into the holes (Meyerson 
et al., 2014). Protein denaturation at the air/water interface can 
be addressed by using thicker ice (which will, however, reduce 
image contrast), by using a support film that adsorbs the mole- 
cules (but also reduces image contrast), or by chemically fixing 
the sample before vitrification (which has the potential, however, 
to affect the structure). 

Occasionally particles adopt preferred orientations, presum- 
ably due to interactions with the air/water interface. This causes 
a problem for the reconstruction of a 3D density map, which re- 
quires multiple views. One can attempt to overcome this prob- 
lem by using thicker ice, by adding low amounts of detergent 
(lowering the surface tension of the air/water interface), or by us- 
ing a thin support film to which the molecules can adsorb and 
which will thus keep them away from the air/water interface. 
One can also try to change the glow-discharge parameters or 
to modify the protein, e.g., by adding/removing affinity tags. If 
none of these approaches are successful, it is possible (but tech- 
nically very challenging) to collect images of tilted specimens, 
but this usually prevents achieving high resolution. 

Image Acquisition for Single-Particle Cryo-EM 

Structure determination by single-particle cryo-EM, especially if 
near-atomic resolution is targeted, requires acquisition of high- 
quality images, i.e., images with high contrast and with sufficient 
resolution to answer the biological questions being asked. In 
addition, particularly for high-resolution projects, high efficiency 
is beneficial to make them economical, i.e., one should be able to 
collect a large number of micrographs within a reasonable time- 
frame. Thus, automation of key steps may be called for. While 
modern electron microscopes are capable of delivering resolu- 
tions better than 2 A, collection of good-contrast, high-resolution 
images of vitrified specimens remains challenging. It is therefore 
critical not only to align the electron microscope with great 
care but also to choose appropriate imaging conditions. Adjust- 
able settings include, but are not limited to, selection of the 
condenser aperture and spot size, reduction of imaging aberra- 
tion by coma-free alignment (all briefly discussed in Supple- 
mental Information), as well as issues related to the optimization 
of image contrast, such as appropriate defocus settings, selec- 
tion of objective aperture, and the electron dose used. To learn 
about contrast enhancement by using phase plates, the reader 
is referred to Glaeser (2013). 

The contrast of vitrified biological specimens is very low, and if 
images were taken in focus, they would contain little, if any, use- 
ful information. Images are therefore taken in bright-field mode 
of the electron microscope while applying underfocus (Frank, 
2006). Given a thin object, images are linear projections of the 
Coulomb potential of the specimen, the fundamental property 
necessary for subsequent computational reconstruction of its 
3D structure. The images are modulated by the contrast transfer 
function (CTF), a quasi-periodic sine function in reciprocal 



space, the periodicity of which depends, among other parame- 
ters, on the defocus setting (Wade, 1992; and Supplemental 
Information). Furthermore, the amplitudes of the high spatial fre- 
quencies (high-resolution detail) in an image are attenuated by 
an envelope function of the CTF. Its rate of decline depends on 
the spatial coherence of the electron beam, and it increases 
with increasing image defocus. Therefore, a higher defocus 
boosts the low-resolution image contrast but weakens the 
high-resolution contrast, limiting the frequency range of useful 
information. Thus, it is best to use the smallest possible defocus 
that still creates sufficient low-resolution image contrast to 
clearly see the particles. This means that for large molecules, 
e.g., viruses, a small underfocus can be used. For small particles 
(molecular mass less than 200 kDa), however, it is often neces- 
sary to underfocus by a few micrometers, which will limit the 
resolution that can be achieved. Importantly, as the CTF has mul- 
tiple zero crossings, some information within a single image is 
lost, which is the reason why images have to be collected at 
different underfocus settings to sample the entire reciprocal 
space (Penczek, 2010a; Zhu et al., 1997). 

The use of an objective aperture increases amplitude contrast 
by cutting off electrons scattered at high angles. However, as it 
also sets a cut-off limit for the resolution, a relatively large objec- 
tive aperture has to be used for high-resolution single-particle 
cryo-EM imaging (e.g., 70 iim or 100 iim). 

Using a higher electron dose also increases image contrast, 
but higher electron doses will increase radiation damage. There- 
fore, for single-exposure images and to achieve high resolution, 
the electron dose is typically kept below ~20 e“/A^. Much higher 
electron doses can be used when movies are recorded (see 
below). The dose rate also needs to be considered and depends 
on the type of detector being used for imaging. For imaging on 
film or when a charge-coupled device (CCD) camera is used, a 
high dose rate (high beam intensity) is typically used to keep 
the exposure short (~1 s or less), which minimizes the extent 
of specimen drift during exposure. Short exposures are also 
preferred when integrating DDD cameras are used to collect sin- 
gle-exposure images, but longer exposures can be used when 
they are operated in movie mode, which reduces or eliminates 
the problem of specimen drift (see below). The situation is 
different for electron-counting DDD cameras. To ensure that 
electrons are counted properly, the dose rate must be kept 
below ~10 e“/pixel/sec (based on current technology) on the 
camera (Li et al., 2013b; Ruskin et al., 2013). Higher dose rates 
adversely affect electron counting, thus lowering the DQE and 
image contrast. 

A factor contributing to the recent improvement of attainable 
resolution in cryo-EM is the movie mode available on some 
DDD cameras. Here, the total electron dose is fractionated into 
a series of image frames that can be aligned to compensate 
for specimen drift and beam-induced movement, thus reducing 
image blurring (Figure 2) (Brilot et al., 201 2; Campbell et al., 201 2; 
Li et al., 2013a). After alignment, the frames are averaged, and 
the resulting image is used for subsequent structure determina- 
tion. Movies are made possible by the fast readout and the 
“rolling shutter” mode of CMOS detectors that underlie all 
DDD cameras and some newer scintillator-based cameras. 
Some software packages also allow for sub-frame alignment 
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to account for local motions that occur during beam exposure 
(Rubinstein and Brubaker, 2014; Scheres, 2014). Movies also 
offer the possibility to optimize the overall Signal-to-Noise Ratio 
(SNR) in images of specimens affected by radiation damage. 
Early frames correspond to a low electron dose and therefore 
contain high-resolution signal from the least damaged spec- 
imen. However, early frames are also often still affected by fast 
specimen movement (Figure 2B), blurring the high-resolution in- 
formation. While specimen movement typically slows down and 
affects later frames less, these correspond to a higher accumu- 
lated dose and increasingly lack high-resolution information. 
When movie frames are averaged, a relative weighting can be 
applied that optimizes the signal in the final average (Campbell 
et al., 2012; Scheres, 2014). As an intermediate measure to 
improve high-resolution image contrast, one can exclude the 
initial two or three frames (which are often still affected by high 
initial specimen movement), as well as the later frames that 
correspond to a total dose of ~20 e“/A^ and higher from the 
frame averages. However, this strategy results in the loss of 
low-resolution contrast. Therefore, it may currently be best to 
use images containing all the movie frames in the alignment 
step during image processing and to use images without the 
initial and final frames to calculate the final 3D map (Li et al., 
2013a). 

The attainable resolution depends on the pixel size on the 
specimen level, which, in turn, depends on the effective magni- 
fication. The physical pixel sizes of digital cameras vary as well 
as the exact position of the cameras in the optical path. There- 
fore, the image pixel size has to be calibrated not only for each 
magnification but also for every microscope/camera combina- 
tion (a protocol for how to calibrate the magnification is 
described in Supplemental Information). The Nyquist theorem 
specifies that the theoretically attainable resolution is limited to 
twice the pixel size, but interpolation errors introduced by image 
processing operations and low DQE values of the detector near 
the Nyquist frequency limit the practically attainable resolution 
further (Penczek, 2010b). As a rule of thumb, the practical reso- 
lution limit is closer to three times the pixel size. 

Image Processing 

A significant part of the workload of a single-particle project is 
taken up by the processing of the recorded images. The main 
steps are discussed here, including correction of the microscope 
CTF, selection of particles and preparation of image stacks, gen- 
eration of an initial structure and its refinement, treatment of 
structural heterogeneity, assessment of resolution, and interpre- 
tation of the final 3D density maps. A number of software pack- 
ages exist that have been developed over the last four decades 
and are still being improved. While the development of software 
is important for the success of single-particle cryo-EM, the 
recent groundbreaking results are primarily due to the use of 
direct detectors and the recording of movies. Prior to their com- 
mon use, none of the currently employed algorithms and soft- 
ware packages was capable of yielding results comparable to 
what is now possible. After direct detectors and movies were 
adopted, near-atomic resolution was achieved with several soft- 
ware packages, including SPIDER (Frank et al., 1981), EMAN2 
(Tang et al., 2007), FREALIGN (Grigorieff, 2007), RELION 



(Scheres, 2012), and SPARX (Hohn et al., 2007). To date, 
EMAN/EMAN2 has been, and continues to be, the most popular 
software, owing to its extensive options, flexibility, and user 
friendliness. However, users new to cryo-EM may find it easier 
to start with more specialized software, such as RELION, which 
offers streamlined processing with fewer options and one main 
algorithmic approach (maximum likelihood). This primer is not 
meant to serve as a manual for any specific image processing 
software package, but instead tries to relate basic concepts, 
which may be implemented in different ways in different software 
packages. 

Estimation of CTF Parameters and Correction for Its 
Effects 

The accurate estimation of CTF parameters is important for both 
the initial evaluation of micrograph quality and subsequent struc- 
ture determination. To calculate the CTF, the parameters that 
have to be known are acceleration voltage, spherical aberration, 
defocus, astigmatism, and percentage of amplitude contrast. 
Voltage and spherical aberration are instrument parameters 
that are usually used without further refinement (although the 
value for the spherical aberration provided by the manufacturer 
may not be completely accurate). The defocus is set during 
data collection, but the setting is only approximate. More accu- 
rate values for defocus and astigmatism are obtained by fitting a 
calculated CTF pattern (e.g., Mindell and Grigorieff, 2003) to the 
Thon rings (semi-circular intensity oscillations induced by the 
CTF seen in the power spectrum of the image [Thon, 1966]). 
The contribution of the amplitude contrast is typically assumed 
as 5%-10% for cryo-EM images. 

Cnee the CTF parameters have been determined and as long 
as a set of particle views that differ by defocus settings is avail- 
able, correction for the CTF effects is possible and straightfor- 
ward (Penczek, 2010a). It can be done for both amplitudes 
and phases (full CTF correction) or only for the phases (phase 
flipping). For more detail on CTF estimation and correction, see 
Supplemental Information. 

Ultimately, the determined 3D structure should be corrected 
for the reciprocal space envelope functions that suppress 
high-frequency information, and thus visual resolvability of 
map details. These envelope functions describe effects of micro- 
scope optics, limitations of digital scanners and cameras, and 
errors in orientation parameters assigned to particle images 
(Jensen, 2001 and section on power spectrum adjustment in 
Supplemental Information). 

Particle Picking 

Cnee a dataset has been collected, movies have been aligned 
and averaged (if applicable), and good micrographs have been 
selected (e.g., based on Thon rings being visible to high resolu- 
tion in all directions), a project continues with the labor-intensive 
process of particle picking. The quality of the selected particles 
is a major factor in the subsequent analysis, as inclusion of too 
many poor particles may preclude successful structure determi- 
nation. Moreover, methods aimed at cleaning up the selected 
particles are not very robust, and many artifacts pass all tests 
and adversely affect subsequent data processing efforts. Parti- 
cles can be selected in a manual, semi-automated, and fully 
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Figure 3. Principle of the K-means Algorithm Used in Single-Particle EM Structure Determination Protocols 

(A) In the basic /C-means algorithm, the particle images are compared with a set of class averages using a correlation measure that yields the class assignment. 
Based on the updated class assignments, new class averages are then calculated. Simply by adding 2D alignment of the images to the templates using a 
correlation function, the algorithm is converted to multi-reference alignment (MRA) (indicated by text in red font). 

(B) Principle of the projection matching technique used for 3D single-particle EM structure refinement. The best match of an image to a template yields the Euler 
angles that were used to generate the template, while a 2D alignment step yields the third, in-plane Euler angle and the two in-plane translations, the total of five 
orientation parameters required for the 3D reconstruction step. 



automated manner. In the early stages of analysis, particularly 
when little is known about the shape of the protein and the dis- 
tribution of the projection views, the manual approach is prefer- 
able. A trained and careful practitioner can obtain much better 
results than automated approaches, but the risk is that humans 
tend to focus on more familiar and better visible particle views, 
thus omitting less frequently appearing orientations that may 
be needed for successful structure determination. In semi-auto- 
mated approaches, the computer performs an initial step of 
detection of putative particles in a micrograph. All candidates 
are windowed, and the user removes poor particles from the gal- 
lery of possible candidates. Fully automated procedures can be 
divided into three groups: those that rely on ad hoc steps of 
denoising and contrast enhancement followed by a search for 
regions of a given size that emerge above the background level 
(Adiga et al., 2004); those that extract orientation-independent 
statistical features from regions of micrographs that may contain 
particles and proceed with classification (Hall and Patwardhan, 
2004; Lata et al., 1995); and those that employ templates, i.e., 
either class averages of particles selected from micrographs or 
projections from a known 3D structure of the complex (Chen 
and Grigorieff, 2007; Huang and Penczek, 2004; Sigworth, 
2004). The use of fully automated procedures carries even higher 
risks of introducing bias, as positively correlating noise features 
are indistinguishable from weak but valid signal. Therefore, one 
faces the risk of eventually merely reproducing the template 
structure. The study of the HIV-1 envelope glycoprotein is a 
prominent example in which template bias likely played a 
deciding role (Mao et al., 2013). Good practice is therefore to 
rely on template-based particle picking only if particles are 
clearly visible in the micrographs. 

With particle coordinates identified in the micrographs, the 
particles are windowed and assembled into a stack. The initial 



locations are not very precise. Therefore, the window size should 
exceed the approximate particle size by at least 30% (more for 
small particles). For issues relating to aliasing and density 
normalization, see Supplemental Information. 

2D Clustering and Formation of Class Averages 

The first step in single-particle EM structure determination is the 
analysis of the 2D image dataset, particularly the alignment and 
grouping of the data into homogenous subsets. There are 
several reasons for why it is best to begin with 2D analysis: (1) 
2D datasets contain image artifacts, invalid particles, or simply 
empty fields that should be removed; (2) the angular distribution 
of the particle views is unknown and if the set is dominated 
by just a few views, 3D analysis is unlikely to succeed; and (3) 
computational ab initio 3D structure determination requires 
high-SNR input data, as is present in high-quality class 
averages. 

Various strategies have been proposed to deal with the prob- 
lem of alignment and clustering of large sets of single-particle 
EM images (Joyeux and Penczek, 2002; Penczek, 2008), but 
all are fundamentally rooted in the popular K-means clustering 
algorithm (Figure 3A). As most steps in single-particle EM 
analysis use a variant of this algorithm, including 2D multi- 
reference alignment, 3D multi-reference refinement, even 3D 
structure refinement (projection matching), the principles and 
properties of K-means clustering are described in Supplemental 
Information. 

A straightforward implementation of the K-means algorithm in 
single-particle EM analysis is 2D multi-reference alignment 
(MRA) (van Heel and Stoffler-Meilicke, 1985), a process in which 
the dataset is presented with K seed templates, and all images 
are aligned to and compared with all templates and assigned 
to the one they most resemble. The process is iterative: a new 
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set of templates is computed by averaging images based on re- 
sults from the initial grouping (including transformations given by 
alignment of the data in the previous step), and the whole proce- 
dure is repeated until a stable solution is reached. To accelerate 
the procedure, one can employ an additional step of principal 
component analysis (PCA) executed so that the clustering is 
actually performed using factorial coordinates, not the original 
images (for in-depth reading, see Frank, 2006). All major sin- 
gle-particle EM software packages contain a version of MRA, 
often with various heuristics aimed at improved performance, 
particularly with respect to the problem of “group collapse”: as 
MRA combines alignment with clustering, the process is unsta- 
ble in that the more common particle views produce large, 
high-SNR class averages, which in turn “attract” less common 
or more noisy images, eventually leading to the disappearance 
of less populous groups. 

In light of the fundamental shortcomings of MRA (see Supple- 
mental Information), the iterative stable alignment and clustering 
(ISAC) method has been developed (Yang et al., 2012). This 
method uses a dedicated clustering algorithm to counteract 
group collapse and employs a multi-level validation strategy 
of the identified groups, thus yielding uniquely homogeneous 
classes of images (see Supplemental Information for more 
information). 

Calculation of Initial Structures 

Ab initio 3D structure determination is necessary in cases in 
which no reasonable templates or guesses for the structure 
exist. Even though new implementations of 3D structure refine- 
ment algorithms are increasingly robust, initial templates, when 
available, can contain significant errors and an attempt to 
initialize structure refinement with such templates and raw EM 
particles is likely to fail. When available, 3D templates can be 
used, e.g., a low-resolution negative-stain EM 3D reconstruc- 
tion, an appropriately filtered X-ray model or an EM map of a 
homolog (Beckmann et al., 1997). If high point-group symmetry 
is present, particularly icosahedral symmetry, some refinement 
algorithms will converge properly with random initialization. 
However, it is always better to execute all steps indicated in 
this and the previous sections, because extensive validation 
methodology built into the 2D analysis and ab initio steps signif- 
icantly increases confidence in the final outcome. 

Ab initio structure determination methods can be broadly 
divided into those that require additional experiments, typically 
in the form of tilt pairs, and those that use only data of untilted 
specimens and rely entirely on computational strategies to 
deliver the structure. 

The earliest and still the most commonly used ab initio tilt- 
based structure determination method is the random conical 
tilt (RCT) approach (Radermacher et al., 1987). Because one of 
the orientation parameters is set experimentally (tilt angle) and 
others can be computed in a robust manner (in-plane rotation, 
tilt angle correction), the method will deliver a reliable initial struc- 
ture. It is, however, difficult to collect high-tilt data of acceptable 
quality, especially for vitrified specimens, in which case charging 
and beam-induced movement can be severe. Most RCT work is 
thus done using negatively stained specimens, but the artifacts 
associated with staining (Cheng et al., 2006) and the missing 



cone problem (Frank, 2006) that further degrades the quality of 
the 3D map limit the utility of the resulting structures. However, 
RCT is a virtually foolproof method and its outcome will 
immensely increase the confidence in the final structure. 

Computational ab initio structure determination methods seek 
to determine five orientation parameters (three Euler angles and 
two translations) for each projection image such that the result- 
ing 3D structure is “best” in the sense of some mathematical cri- 
terion. Due to the low quality of EM data and also due to the time 
needed for the calculations, virtually all ab initio methods in use 
assume the input to be a relatively small set (<1 ,000) of class 
averages that result from 2D analysis. Since the success of the 
3D orientation search strongly depends on the data quality, it 
is particularly important that the used class averages represent 
homogeneous particle groups. 

The earliest computational ab initio structure determination 
approach is based on the central section theorem: since Fourier 
transforms of 2D projections of a 3D object are central sections 
through the 3D Fourier transform of the object, Fourier trans- 
forms of any two projections will intersect along a line, called a 
“common line.” The common-lines approach was first imple- 
mented in IMAGIC as “angular reconstitution,” taking advantage 
of the existence of a mathematical solution for orienting three 
projections (van Heel, 1987). Thus, in angular reconstitution, 
the user selects triplets of class averages, and multiple triplets 
are then merged into a common framework, yielding the final 
3D structure. The procedure depends critically on user choices 
and one is thus advised to explore various combinations to 
gain confidence in the ultimate outcome. 

A recently introduced approach to ab initio 3D structure deter- 
mination, which shows great promise in producing reliable initial 
models, is based on projection matching using the stochastic hill 
climbing (SHC) algorithm. The SHC strategy was first imple- 
mented in the software package SIMPLE (Elmiund and Elmiund, 
2012), and has been expanded to the “validation of individual 
parameter reproducibility” (VIPER) approach, which incorpo- 
rates validation steps into the structure determination process, 
monitoring the orientation parameters (Penczek, 2014a). See 
Supplemental Information for further information on projection 
matching, SHC and VIPER. 

Structure Refinement and Resolution 

After obtaining an initial map, the structure has to be refined to 
obtain the final map (Figures 4A-4D). All single-particle EM pack- 
ages use a more or less elaborate version of the 3D projection 
matching procedure (Figure 3B and Supplemental Information) 
for structure refinement. It modifies the orientation parameters 
of single-particle images (projections) to achieve a better match 
with reprojections computed from the current approximation of 
the structure (Penczek, 2008). While all implementations share 
the same principle of projection matching, the details of the 
methodology and the degree to which the user can control 
the process vary widely and are discussed in Supplemental 
Information. 

Progress of the refinement is monitored by a number of 
indicators, in particular the Fourier shell correlation (FSC) curve 
(Figure 4E), which provides information on the level of the SNR 
as a function of the spatial frequency (Penczek, 2010c), and 
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Figure 4. Evaluation and Validation of a 3D 
EM Structure 

Critical evaluation of EM structural results is of 
utmost importance due to potential model bias 
and unavoidable noise alignment inherent to 
the single-particle EM structure determination 
method. Ultimate confirmation of the map, 
particularly of the details at the limit of the resolu- 
tion claimed, is best done by independent struc- 
ture determination, possibly using different soft- 
ware packages, even if one uses the same dataset. 
Here, we show the results of two outcomes for the 
structure determination of the TRPV1 channel. 

(A) Originally, the structure was solved using 
RELION (Scheres, 2012): the refinement was 
initialized with an ROT model, and the final map 
represents the best class produced by 3D MRA 
(Liao et al., 2013). 

(B) The structure determination was repeated us- 
ing the same 2D dataset. 2D MRA was performed 
using IMAGIC (van Heel et al., 1996), an initial 
model was generated using EMAN2 (Tang et al., 
2007), and refinement and 3D MRA were done in 
FREALIGN (Grigorieff, 2007; Lyumkis et al., 2013). 

For consistency, the rotationally averaged power spectrum of map (B) was set to that of map (A). Interestingly, while the two maps are visually very similar, only 
^60% of particles in the best class determined by RELION coincide with those in the best class determined by FREALIGN. This difference likely reflects limi- 
tations of K'-means-based clustering approaches and, possibly, points to the fact that the number of classes used was too low. 

(0 and D) The side-chain densities in the best parts of the map shown in (A) agree with those of the map shown in (B), validating these details. However, some 
weaker peripheral density features in the maps shown in (A) and (B) exhibit noticeable differences (see Supplemental Information and Figure S2). 

(E) Angular uncertainty and blurring affects the FSO curves, and thus the resolution reported: calculation of 3D reconstructions using multiple, probability- 
weighted copies of each particle image (“soft matching,” see Supplemental Information) can lead to an apparent improvement in the resolution (RELION, black 
curve) while hard matching yields more conservative results (FREALIGN, red curve). The difference is, however, too small to affect the interpretation of the maps 
and also lies within the general uncertainty bounds of the FSC methodology, which also depends on other data-processing steps, as, for example, masking of 
the map. 

(F) The resolution of the map is non-uniform. The local resolution of the map shown in (B) was calculated (Penczek, 201 4c) and indicates that densities within the 
membrane domain, and particularly around the pore, are better resolved than those in the extracellular domains. 3D maps were rendered using UCSF Chimera 
(Pettersen et al., 2004). 




the resolution of the map. The FSC curve is obtained by splitting 
the dataset into halves, calculating a volume from each half, and 
computing correlation coefficients within resolution shells ex- 
tracted from Fourier transforms of the two volumes. Importantly, 
the definition requires that the noise in the two structures should 
be independent, a condition difficult to meet in practice and often 
compromised by refining a single dataset while evaluating the 
FSC with two structures computed from half-subsets of the 
entire set. “Resolution” in single-particle EM is then a somewhat 
arbitrarily chosen cut-off level of the SNR or FSC curve. For 
example, the resolution can be defined as the spatial frequency 
at which the FSC curve is 0.5 or as the spatial frequency at which 
the SNR is 1 .0 (corresponding to an FSC of 0.33), the level at 
which the power of the signal is equal to the power of the noise. 
Another common choice of threshold is 0.1 43, the value selected 
based on relating EM results to those in X-ray crystallography 
(Rosenthal and Henderson, 2003). 

A common problem in structure refinement is so-called “over- 
fitting” of the data— the emergence of features in an EM map due 
to the alignment of noise. Over-fitting arises due to the fact that 
the dataset is refined without reference to external standards (at 
least before the emergence of secondary-structure features 
whose generic appearance is known), and, therefore, it is not 
known what constitutes “signal” and what is “noise” (Stewart 
and Grigorieff, 2004). As a result, artifacts are created by chance 
and further enhanced by alignment of the noise components in 
the data, leading to inflated FSC values and an artificially high 
resolution. It was realized early on that in order to ensure inde- 



pendence of noise in the half-dataset maps used to calculate 
the FSC, the half-datasets must be refined independently (e.g., 
Grigorieff, 2000). This avoids exaggerated resolution estimates 
using the FSC, an approach that has recently been reiterated 
(Scheres, 2012) and is now often referred to as the “gold stan- 
dard” refinement procedure (Henderson et al., 2012). It has to 
be noted, however, that even this procedure has limitations, as 
(1) it is impossible to have true independence between the half 
datasets; (2) the approach tends to underestimate the resolution 
potential of the data; and (3) for all existing refinement algo- 
rithms, each of the half-structures suffers independently from 
the described “over-fitting” problem. There are also a number 
of image-processing steps that result in a nominal improvement 
of the resolution without actually improving the image alignment 
parameters (Figure 4E) or map. An obvious example is masking 
of the structure, as the shape of the mask and the way its edges 
are attenuated may have a significant impact on the FSC curve. 
One can also set density values to a constant when they are 
lower than a certain level, a step that is akin to solvent flattening 
in X-ray crystallography. Since none of these operations are 
codified in the field and since the FSC curve is also dependent 
on other factors beyond the ones mentioned here, it is the 
opinion of the authors that there is currently no real “gold stan- 
dard” procedure for structure refinement and resolution estima- 
tion of an EM map. An approach equally useful to the “gold 
standard” procedure to obtain an adequate resolution estimate 
is simply to limit the refinement frequency to a resolution lower 
than the one of the reference map. 
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In conclusion, the quality of an EM map is described by the 
entire FSC curve, not just the resolution, and there are EM maps 
with the same nominal resolution that differ significantly in overall 
quality (Ludtke and Serysheva, 2013). The reverse is also true, 
namely that the reported nominal resolution reflects the overall 
resolution of the entire density map but it does not account for 
local variation. The EM map with the highest nominal resolution 
is not necessarily the best one, because values at lower fre- 
quencies often matter more for connectivity and interpretability 
of the map. Hence, the resolution reported for an EM map should 
be treated as a broad guideline rather than a firm number. 

3D Multi-Reference Alignment 

Many samples will contain structural heterogeneity. When its 
presence is detected (for example by calculating a variance 
map, see below), a possibility is to use 3D multi-reference align- 
ment (3D MRA) to extract more homogeneous subsets. Current 
implementations are natural extensions of the basic projection- 
matching procedure and employ principles of the K-means algo- 
rithm: the user has to provide a number of initial 3D templates and 
the program aligns each single-particle image to all 3D templates 
to find the best-matching one. When all images are assigned, 
new 3D reconstructions are calculated and used as new refer- 
ences. The method proved to be successful in many applications 
(Brink et al., 2004; Heymann et al., 2003; Loerke et al., 2010; 
Schuler et al., 2006), particularly when “focusing” on a variable 
sub-region to make the assignments (Penczek et al., 2006). The 
shortcomings of 3D MRA are those of the K-means algorithm: a 
strong bias toward initial templates, solutions that depend on 
the number K of requested classes, and a lack of validation of 
the results. In light of these limitations, the applicability of 3D 
MRA should be guided by the concurrent examination of the local 
variability of the map (Penczek 2014c). Indeed, if the procedure 
succeeded in separating the dataset into homogeneous classes, 
the distribution of 3D variability within each group should be uni- 
form (in practice it tends to be proportional to the density distribu- 
tion of the map). Any residual local variability that exceeds what is 
reasonably expected, particularly at locations where map density 
is low, signals that 3D MRA should be continued with an 
increased number of classes and possibly in the “focused” 
mode. The 3D MRA procedure works best for complexes exhib- 
iting substoichiometric ligand binding in which a fragmented 
appearance of the ligand would indicate failure of the procedure, 
and results can be validated by the appearance of secondary 
structure elements in the 3D class averages. 

Structure Validation and Interpretation 

As explained above, the indication of a certain resolution by the 
FSC alone does not demonstrate the validity of the refined struc- 
ture. Independent refinement of two exclusive subsets of the 
data increases confidence in the resolution but does not neces- 
sarily confirm the validity of a structure. This is particularly true 
for reconstructions that do not resolve secondary structure fea- 
tures. Because refinement is typically initialized with the same 3D 
template, even if low-pass filtered, this undermines the indepen- 
dence assumption. Furthermore, the FSC may fail entirely to 
indicate resolution when there is significant misalignment of 
the particle images. All current refinement software may display 



this behavior of the FSC, including software that performs sepa- 
rate refinement of subsets of the data. It is therefore equally 
important to also apply other plausibility criteria to the results 
whenever possible (see below). 

In case of a heterogeneous dataset, the refinement itself might 
be correct, but the structure, being a superposition of various 
states, will have limited biological relevance. Therefore, addi- 
tional tests are recommended, particularly those that reveal the 
localized real-space quality of the map. First, it is possible to 
compute the local resolution of the map using a wavelet-based 
(Kucukelbir et al., 2014) or an FSC-equivalent approach (Penc- 
zek, 2014b) (Figure 4F). Local real-space variability of the map 
can be assessed using a simplified variance approach (Penczek, 
201 4c). More information could be obtained from analysis of cor- 
relations within the map, as in 3D PCA, by statistical resampling 
(Penczek et al., 2011), which is computationally demanding and 
yields only low-resolution information. A local variability analysis 
can also serve as a means to establish plausible initial templates 
for 3D MRA (Spahn and Penczek, 2009). 

The overall validation of an EM map depends on the resolution 
reached. We can distinguish three resolution regimes that may 
help confirm the resolution indicated by the FSC. A low-resolu- 
tion map (>10 A) reveals the overall shape of a complex and 
possibly the relative arrangement of major modules. Here, dock- 
ing of X-ray segments is unreliable, and flexible fitting should 
be avoided. An intermediate-resolution map (4-10 A) reveals 
secondary structure details and the relative arrangement of 
modules. It enables unique fitting of X-ray segments and can 
be used to detect conformational changes. A high-resolution 
map (<4 A) clearly resolves secondary structure elements (e.g., 
a helices) and some individual residues, allowing polypeptide 
backbone tracing (Figures 4C and 4D) and precise fitting of 
X-ray segments. It also provides a detailed description of confor- 
mational changes. Keeping in mind that the precise resolution 
number attached to the map can often not be reliably estab- 
lished, one should focus on arguments that give confidence 
that the overall appearance of the map is correct. Thus, for 
low-resolution maps, the best evidence is provided by tilt exper- 
iments, particularly by initiating the project by RCT reconstruc- 
tion. While final details of the map might be debatable, at least 
the possibility of major mistakes is minimized. A map at interme- 
diate resolution can be confirmed if the appearance of subunits 
agrees with the appearance of segments determined by X-ray 
crystallography, if available. A measure of confidence can also 
be provided by a posteriori tilt experiments (Henderson et al., 
2011). In these experiments, often referred to as “tilt test,” a 
small set of image pairs is collected, one untilted and a second 
with a small sample tilt, for example 1 0 degree. The test requires 
projection matching of the particles from the tilt pairs to the EM 
map that is to be validated. If the difference between the views 
determined for the tilt pairs corresponds (within error) to the 
known tilt angle, the EM map is considered valid. High-resolution 
maps must display known features of secondary structure ele- 
ments and density for bulky side chains. These features can be 
further corroborated with a plausible atomic model that can 
also be used to obtain an independent resolution estimate by 
converting it to pseudo-electron density and low-pass filtering 
it to the claimed resolution of the EM map. 
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The interpretation of EM maps depends mainly on three fac- 
tors: the resolution of the map, the established presence of mul- 
tiple conformational states in the sample, and the availability of 
X-ray crystallographic segments of some components, of the 
entire complex or of one of its homologs. High-resolution EM 
maps can be analyzed in the same manner as X-ray maps by per- 
forming de novo backbone tracing. Furthermore, because EM 
experiments yield both amplitudes and phases, it is possible to 
arrive at reliable atomic models even in cases in which crystallo- 
graphic efforts were unsuccessful or comparison with an atomic 
model is difficult. In addition, the availability of local resolution 
and variability measures is helpful in avoiding over-interpretation 
of poorly resolved regions of EM maps. At high resolution, 
docking of X-ray segments can be done with high precision, 
thus increasing apparent resolvability of the results and making 
it possible to detect atomic scale conformational changes with 
respect to the X-ray results. Similarly, availability of high-resolu- 
tion structures of multiple functional states of the complex makes 
single-particle EM a unique tool to study protein dynamics. 

Intermediate-resolution EM maps offer insights into the 
arrangement of subunits and localization of functional sites of 
macromolecular complexes. Structure interpretation can be 
augmented by docking of X-ray segments, if available, which 
also improves the precision of feature localization. The docking 
can be accomplished, for example, using UCSF Chimera (Pet- 
tersen et al., 2004). However, as the resolution of EM maps 
gets worse, so does the precision of docking. While some prog- 
ress has been made in this area, reliable computational tools to 
assess docking uncertainty as a function of map quality are 
lacking, so some caution is needed to avoid over-interpretation 
of the results. 

The main utility of low-resolution EM maps is in revealing the 
overall architecture of a complex. Results of docking X-ray seg- 
ments should be interpreted with utmost caution, because 
determining the best-fitting position of a given segment does 
not mean that it is its only possible localization, creating the pos- 
sibility of major mistakes. At the same time, low-resolution EM 
maps have the added value that they can often provide a step- 
ping stone toward higher resolution, and thus more informative 
results. 

Concluding Remarks 

Structure determination by single-particle cryo-EM is an 
increasingly popular approach, but like most experimental meth- 
odologies, it is important not to approach it with “plug and play” 
assumptions. We hope that the information provided in this 
Primer will be helpful in guiding the execution of this technique 
and the interpretation of data obtained with it. 
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Until only a few years ago, single-particle electron cryo-microscopy (cryo-EM) was usually not the 
first choice for many structural biologists due to its limited resolution in the range of nanometer to 
subnanometer. Now, this method rivals X-ray crystallography in terms of resolution and can be 
used to determine atomic structures of macromolecules that are either refractory to crystallization 
or difficult to crystallize in specific functional states. In this review, I discuss the recent break- 
throughs in both hardware and software that transformed cryo-microscopy, enabling understand- 
ing of complex biomolecules and their functions at atomic level. 



A major goal of structural biology is to provide mechanistic 
understanding of critical biological processes. The most detailed 
insights come from atomic structures of macromolecules and 
complexes involved in these processes in relevant functional 
states. Beyond basic research, obtaining atomic structures 
of drug targets is also a standard approach in the pharma- 
ceutical industry in the design and optimization of therapeutic 
compounds. 

Prior to 2013, most atomic structures deposited in the protein 
data bank (PDB) were determined by X-ray crystallography. This 
technique starts with crystallization of molecules that are homo- 
geneous in both composition and conformation. Once the 3D 
crystals are of sufficient size to diffract X-rays, they are used 
for structure determination. The resolution of crystal structures 
is largely determined by how well the molecules are ordered 
(or aligned to each other) in the crystal. After 1 00 years of devel- 
opment and maturation. X-ray crystallography has become a 
routine method, delivering a wealth of structural information 
about important biomolecules and cellular processes (Jones, 
2014; Shi, 2014). While X-ray crystallography will continue to 
play an important role in answering many biological questions, 
it completely depends on growth of well-ordered 3D crystals. 
Producing such crystals, however, is a major bottleneck for chal- 
lenging targets, such as integral membrane proteins of mamma- 
lian origin or chromatin in complex with its modifiers. In the last 2 
years, single particle electron cryo-microscopy (cryo-EM) has 
emerged as a technique for determining atomic resolution struc- 
tures at better than 4-A resolution, comparable to many solved 
using crystallographic approaches. It has now determined a 
number of structures of proteins and complexes that have vexed 
crystallographers. 

The Way Electron Cryo-Microscopy Works 

Rather than determining structures from diffraction of 3D crys- 
tals, single-particle cryo-EM determines structures by computa- 
tionally combining images of many individual macromolecules 
in identical or similar conformations (Frank et al., 1978). In this 



approach, samples of purified molecules in solution are applied 
to an EM grid covered with a thin holey carbon film and blotted by 
a filter paper to remove most of solution so that the a thin liquid 
layer is formed across the holes in the carbon film. This is 
followed by plunge-frozen in liquid ethane cooled by liquid nitro- 
gen. This method was originally developed by Dubochet and 
colleagues (Dubochet et al., 1982), and improved significantly 
with semi-automated plunge-freezer machine to improve 
reproducibility. After plunge-freezing, frozen-hyd rated mole- 
cules are embedded in a thin layer of vitreous ice (Figure 1A) 
that preserves the native structure to the atomic level (Taylor 
and Glaeser, 1974), prevents dehydration of biological samples 
within the vacuum of an electron microscope, and reduces the 
effects of radiation damage (Stark et al., 1996). Molecules 
embedded in a thin layer of vitreous ice adopt a range of 
orientations, which are then imaged using an electron beam 
(Figure 1 B). Each particle image is a 2D projection of a molecule, 
whose spatial orientation and position are defined by six geo- 
metric parameters. These include three Euler angles and two 
in-plane positional parameters. The sixth parameter is the defo- 
cus that defines the z position along the direction of the electron 
beam and is often assumed to be the same for all particles in a 
micrograph (or image). After further correction for aberrational 
errors of the microscope, a 3D structure can be reconstructed 
by combining images of many molecules that have been aligned 
to each other. The resolution of the reconstruction is improved 
iteratively by refining the first five geometric parameters for 
each particle to high accuracy (Frank, 1996). The final 3D recon- 
struction is a Coulomb potential density map that can be inter- 
preted in the same way as electron density maps determined 
by X-ray crystallography (Figures 1C and 1 D). 

Both X-ray and electron beams cause radiation damage to 
biological samples. For X-ray diffraction, a larger crystal with 
coherently packed molecules can tolerate a high total dose 
and often diffracts to high resolution because more molecules 
contribute to the diffraction. For single-particle cryo-EM, the to- 
tal electron dose used to image each molecule is set to a very low 
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level to preserve structural information at the subnanometer-res- 
olution level. The consequence of such low-dose imaging is that 
individual images have a very poor signal-to-noise ratio (SNR). 
Hence, images from many identical or similar molecules must 
be averaged to enhance the SNR as well as to provide the 
different views needed to calculate a 3D reconstruction (De 
Rosier and Klug, 1968). Therefore, the total number of particle 
images used in a reconstruction has a similar significance to 
the size of a 3D crystal. Similarly, the accuracy of image align- 
ment in single-particle EM is analogous to how well molecules 
are packed in a 3D crystal. 

Provided that a sufficient number of images containing high- 
resolution information are classified and aligned accurately, sin- 
gle-particle cryo-EM will produce a 3D reconstruction at atomic 
resolution. An atomic model can then be built de novo based 
on fitting the known sequences into the density map from the 
reconstruction. Furthermore, electron micrographs are real- 
space images containing both amplitude and phase information. 
Thus, cryo-EM structure determination does not have a “phase 
problem” as in X-ray crystallography, but its amplitudes are 
less accurate than that measured from X-ray diffractions. 

Resolution Determinants of Single-Particle Cryo-EM 
Reconstructions 

Considering the scattering power of electrons versus X-rays, and 
the amount of information present in an image of a single mole- 
cule that can be used to determine the precise position and 
orientation of the molecule, Richard Henderson predicted that 
single-particle cryo-EM can, in theory, determine atomic-reso- 
lution structures of biological molecules as small as 100 kDa in 
molecular weight (Henderson, 1995). However, there are many 



practical limitations that resulted in a 
gap between what physics allows and 
what can be accomplished by using the 
existing technologies. Some limitations 
are related to the intrinsic properties of 
low-dose imaging of frozen hydrated bio- 
logical molecules, while others are related 
to the properties of frozen-hyd rated 
samples used in single-particle cryo-EM 
(Typke et al., 2004). Overcoming these 
obstacles took many years, but by 2008, it was possible to 
achieve resolutions that were sufficient to visualize side-chain 
densities (~3.8 A) (Yu et al., 2008; Zhang et al., 2008), and to 
determine the first de novo atomic structure (3.3 A) of a non- 
enveloped icosahedral virus (Zhang et al., 2010). Because of 
their large sizes and high symmetry, icosahedral virus particles 
were among the first for which high-resolution maps were ob- 
tained, and now it is quite feasible to determine reconstructions 
of such samples at resolutions better than 4 A (Chen et al., 2009; 
Wolf et al., 2010; Yu et al., 2011). However, it has been much 
harder to achieve similar resolutions for molecules that are 
smaller and/or less symmetric. 

Nowadays, an electron microscope with 200 kV or 300 kV ac- 
celeration voltage and a field emission gun (FEG) electron source 
can typically deliver images with a resolution of better than 2 A. 
Therefore, the achievable resolution of single particle cryo-EM is 
not limited by the resolution power of a modern microscope 
itself, but rather by the conditions required to image frozen- 
hydrated biological samples and the unique properties of such 
samples. Determining a high-resolution 3D reconstruction re- 
quires that 2D projection images contain sufficient information 
at both high and low resolutions. The amount of high-resolution 
information present in images determines the possible final res- 
olution of a 3D reconstruction. However, low-resolution informa- 
tion, i.e., image contrast, is also required to visualize particles. 
Together, they determine how well a homogeneous set of 
molecules can be computationally selected for averaging, how 
accurately these images can be aligned, and the total number 
of images that are required to achieve a certain resolution. For 
any electron micrographs, both image amplitudes and phases 
are modulated by the contrast transfer function (CTF) of the 



Figure 1. Single-Particle Cryo-EM 

(A) Purified bioiogicai moiecuies are embedded 
in a thin iayer of vitreous ice, in which they ideaiiy 
adopt random orientations. The orientations are 
specified by the in-piane position parameters, 
X and y, and three Euier angies a, (3, and y, which 
are refined iterativeiy to high accuracies. The 
defocus vaiues of the images are currentiy often 
determined separateiy. 

(B) Typicai image of frozen -hydrated archaeai 20S 
proteasomes. 

(C) 3D reconstruction of the 20S proteasome at 
3.3-A resoiution. 

(D) Side-chain densities of the map shown in 
(B) are comparabie with those seen in maps 
determined by X-ray crystaiiography at a simiiar 
resoiution. 
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Figure 2. Influence of CTF on Image Contrast and Resolution 

(A and B) Image of human transferrin receptor-transferrin complex recorded 
using a scintillator camera. The microscope was equipped with a FEG and 
operated at 200 kV. Particles in image recorded with a defocus of 1 .2 lam (A) 
are almost invisible, but shown with strong contrast in the image recorded with 
a defocus of 3.0 lam (B). 

(C) Simulations of CTF at 1 .2 i^m (red) and 3.0 i^m (blue) defocuses, with an 
acceleration voltage of 200 kV and angular spread of 0.07 mrad. Note that 
3 i^m defocus generates sufficient contrast for particles with a molecular 
weight of ^300 kDa, CTF envelop drops to close to zero at 3- ^4-A resolution. 



microscope, which is mostly a sine function with an envelope 
that reduces the amplitude at high resolution, such as shown 
in Figure 2C. The overall envelope of the CTF function combines 
effects from many factors, including the spatial and temporal 
coherence of the electron beam, specimen motion, the modula- 
tion transfer function (MTF) of the image recording device, and 
others. The contribution of the spatial coherence to the envelope 
is also a function of the defocus. A small defocus maximizes the 
envelope at high resolution but minimizes the CTF at low resolu- 
tion. Thus, to obtain the best high-resolution signal, an image 
must be recorded with a small defocus, which results, however, 
in a poor image contrast. The converse is also true: to obtain 
good contrast, an image has to be recorded with a relatively 
large defocus, which reduces, however, the high-resolution 
signal (Figure 2). Both low- and high-resolution signals are further 
reduced by the MTF of the image-recording device. 

This is not a serious problem for a radiation-resistant spec- 
imen. Using a sufficient electron dose, a modern electron micro- 
scope can image, for example, a single layer of graphene at 
atomic or near-atomic resolution with good contrast (Urban, 
2011). The weak low-resolution signal is compensated by a 
high-electron dose, which generates sufficient image contrast. 
However, this approach is not possible for biological samples, 
which are sensitive to radiation damage (Henderson and 
Glaeser, 1985). To visualize frozen-hyd rated biological mole- 



cules with sufficient contrast, one has to record images with 
some defocus (Figures 2A and 2B), which causes a reduction 
in the high-resolution signal (Figure 2C). Hence, imaging 
frozen-hydrated biological molecules always requires a fine bal- 
ance between contrast and resolution. Note that such balance 
is always influenced by the microscope hardware, such as the 
spatial coherence of the electron beam, the image recording de- 
vice, etc., as well as by the size and symmetry of the molecule 
being studied. 

The first breakthrough in boosting the resolution of single- 
particle cryo-EM maps came from the use of FEGs, which 
generate an electron beam with much better spatial coherence 
than a thermo-ionic electron source (Zhou and Chiu, 1 993). While 
FEGs do not change the oscillation of the CTF function, at the 
same defocus, high-resolution signal is better preserved in 
images recorded with a microscope equipped with a FEG than 
with a thermo-ionic electron source. FEGs thus enable struc- 
ture determinations at subnanometer resolutions for molecules 
ranging from icosahedral viruses (Bottcher et al., 1997; Conway 
et al., 1 997; Zhou et al., 2000) to molecules as small as ~300 kDa 
with mere 2-fold symmetry (Cheng et al., 2004). 

Manufactures made many efforts to improve microscope 
performance. State-of-the-art electron microscopes nowadays 
use constant-power electromagnetic lenses to improve stability, 
parallel illumination to reduce image phase error induced by 
beam tilt (Glaeser et al., 201 1 ), very high vacuum to reduce water 
contamination on frozen-hydrated samples loaded into the mi- 
croscope column, and better computer control for sophisticated 
and automated microscope tuning and data acquisition (Sulo- 
way et al., 2005), etc. All these features helped to improve the ef- 
ficiency of as well as the resolution achievable by single-particle 
cryo-EM, and they eventually enabled the first de novo atomic 
structure determination of an icosahedral virus (Zhang et al., 
2010). Large and highly symmetrical particles, such as icosahe- 
dral viruses, have certain advantages in achieving better resolu- 
tion by single-particle cryo-EM. They can be imaged with very 
small defocus to preserve the high-resolution signal while still 
provide sufficient image contrast. However, the same approach 
does not work for small molecules. Images of small molecules 
must be recorded using a much larger defocus, thus trading 
high-resolution signal for image contrast. The need to use a rela- 
tively large defocus to generate image contrast was a major 
obstacle in achieving even subnanometer-resolution maps for 
proteins smaller than 300 kDa without high symmetry. Over- 
coming these limitations required new technologies. The simple 
use of small defocus without any other means to generate suffi- 
cient image contrast led to featureless images and controversial 
results (Henderson, 2013; Mao et al., 2013). 

Recent Technological Advances in Single-Particle 
Cryo-EM 

Some recent technological advances led to a major break- 
through in achievable resolution, resulting, in a short period of 
time, in 3- to ~5-A-resolution structures of biological molecules 
ranging from ribosomal particles to integral membrane proteins 
(Allegretti et al., 2014; Amunts et al., 2014; Liao et al., 2013; Lu 
et al., 201 4; Vinothkumar et al., 201 4). Some of these structures 
were determined for proteins with known atomic structures. 
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validating the methodological advancements (Bartesaghi et al., 
2014; Li et al., 2013). Others were determined ab initio for pro- 
teins that resisted crystallization for years (Liao et al., 2013; Lu 
et al., 2014). Here, I will briefly summarize the recent technolog- 
ical advancements and how they enabled a “resolution revolu- 
tion” (Kuhibrandt, 2014). 

Camera Technology 

Image-recording devices are characterized by the detective 
quantum efficiency (DQE), which describes the signal and noise 
performance in a digitally recorded image over the spatial fre- 
quency range (McMullan et al., 2009a; Mooney, 2007). Tradition- 
ally, EM images were recorded on photographic film that was 
subsequently digitized or with scintillator-based digital cameras, 
such as charge-coupled device (CCD) or complementary metal- 
oxide semiconductor (CMOS) cameras. These cameras use a 
thin layer of phosphor scintillator to convert electron signals to 
photons, which are coupled through fiber optics to the camera 
sensor. Photographic film has a relative poor DQE at low 



Figure 3. Direct Electron Detection Camera 
Enabled Major Breakthroughs in Single 
Particle Cryo-EM 

(A) An image of frozen hydrated T. acidophilum 
20S proteasome recorded using K2 Summit 
camera with a 300-kV microscope and a defocus 
of -^0.9 i^m. 

(B) Fourier transform of a typicai imperfect image 
of frozen hydrated 20S proteasome, showing a 
predominant resoiution cutoff caused by beam- 
induced motion. 

(C) Fourier transform of the same image after 
motion correction. Then ring is restored to reso- 
iution of ^3 A. 

(D) 2D ciass averages of TRPV1 ion channei 
caicuiated from images recorded with a scintiiiator 
camera (ieft) and K2 Summit camera (right) (Liao 
etai.,2013). 

(E) Two different views of TRPV1 3D reconstruc- 
tion determined from a dataset coiiected with a 
scintiiiator camera. 

(F) Same views of the TRPV1 3D reconstruction 
determined from a dataset coiiected with a K2 
Summit camera. (A-C) are reproduced from (Li 
etai.,2013). 



spatial frequency, leading to poor image 
contrast. Thus, recording on photo- 
graphic film typically requires imaging at 
a higher defocus to ensure sufficient 
contrast for reliable particle picking and 
accurate image alignment. Scintillator- 
based cameras have a better low-fre- 
quency DQE than photographic film, 
resulting in a better image contrast. How- 
ever, the high-frequency DQE of these 
cameras is significantly poorer than that 
of film, making them less suitable for 
high-resolution imaging (Booth et al., 
2006; Meyer et al., 2000). 

As their name suggests, the new direct 
electron detection cameras no longer 
convert electron signals to light signals 
but detect the electrons directly (McMullan et al., 2009b; McMul- 
lan et al., 2009c). All commercially available direct detection 
cameras have significantly higher DQEs than photographic film 
and scintillator-based cameras in both the low- and high-resolu- 
tion regimes (Li et al., 2013; McMullan et al., 2014; Ruskin et al., 
2013). These cameras typically operate in two distinct modes, 
the linear charge-integration mode or the electron-counting 
mode. In the linear mode, charges generated from electrons 
striking the detector are integrated, while in the counting mode 
individual electron events are identified and counted. An advan- 
tage of operating in the counting mode is that both Landau noise 
(i.e., the fluctuation in energies generated by each electron 
striking the camera sensor) and readout noise are removed. 
Combining direct electron detection with single electron count- 
ing significantly improves the DQE further, particularly at low fre- 
quencies (Li et al., 201 3). Electron-counting cameras thus enable 
recording low-dose cryo-EM images of small particles with 
much smaller defocus values (Figure 3A), providing a much 
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better balance between the requirements for both image 
contrast and high-resolution signal (Li et al., 2013; Liao et al., 
2013; Lu et al.,2014). 

Another important feature of the newly developed direct 
detection cameras is their fast frame readout rate. It enables 
the already low total electron dose used to image biological sam- 
ples to be fractionated into many subframes. Computational 
alignment of these subframes before averaging them can correct 
for motion-induced image blurring, which results from beam- 
induced image motion and mechanical instability of the spec- 
imen holder (Bai et al., 2013; Brilot et al., 2012; Campbell et al., 
2012; Li et al., 2013). The combination of dose fractionation 
and motion correction greatly improves the efficiency of data 
acquisition, because nearly all images can be corrected to 
recover high-resolution information (Figures 3B and 3C). It also 
provides novel means to optimize usage of the total electron 
dose (Baker and Rubinstein, 2010). The contrast can be maxi- 
mized by using a higher total dose and using all frames for par- 
ticle alignment. However, the later frames that record images 
of molecules with higher accumulated electron dose and thus 
more severe radiation damage can later be eliminated or prop- 
erly down-weighted so as to minimize the effect of radiation 
damage on the final 3D reconstruction (Li et al., 2013; Scheres, 
2014). These novel technologies are now being applied in 
many cryo-EM laboratories. They marked the beginning of a 
new era in single-particle cryo-EM, in which atomic structures 
of a broad range of biological macromolecules can be deter- 
mined de novo and without crystallization (Figures 3D-3F). 
Maximum Likeiihood-Based Classification 
A major advantage of single-particle cryo-EM is that it does not 
require absolute sample homogeneity. Computational image 
analysis can deal with a certain level of heterogeneity, both 
conformational and compositional. Such heterogeneity may pre- 
vent crystallization, but in single-particle EM, particles can be 
computationally sorted into different classes, some of which 
may contain relatively homogeneous subsets of particles. Sin- 
gle-particle cryo-EM datasets consist of 2D projection images. 
As determining the orientation parameters of these 2D projec- 
tions is intertwined with the classification of a heterogeneous 
dataset into homogeneous subsets, it is always challenging to 
distinguish whether different 2D projection images represent 
different views of the same molecule or views of molecules 
with different conformations or compositions. While there are 
many ways to classify particles according to their conformations 
or functional states, a particularly powerful approach is to use a 
maximum likelihood-based method for classification and refine- 
ment (Scheres et al., 2007). Implementing sophisticated 
maximum likelihood-based classification and refinement algo- 
rithms (Sigworth, 1998; Sigworth et al., 2010) into user-friendly 
software packages (Lyumkis et al., 2013; Scheres, 2010, 2012) 
made this method easy to use in practice. It has become routine 
now to classify particle images into different 3D classes, each of 
which may be amenable to refinement into higher-resolution 
reconstructions than the global ensemble. The process of 3D 
classification may separate a number of conformations of the 
molecule being studied, or separate fully intact particles or com- 
plexes from incomplete, truncated or fragmented complexes, or 
from those damaged during vitrification (Fernandez et al., 2013; 



Liao et al., 2014). Note that the better image quality provided by 
direct detection cameras and motion correction enabled suc- 
cess of these classification procedures. Almost all newly pub- 
lished near-atomic resolution 3D reconstructions, in one way 
or another, utilized such classification procedures. 

The use of automated data acquisition (Suloway et al., 2005) 
with automated particle-picking procedures enables collecting 
very large datasets with millions of particle images in relatively 
short periods of time. With large numbers of particles it will be 
possible to classify particle images with very subtle conforma- 
tional differences and thus to detect and quantify even subtle 
conformational states that exist within a population. This has 
been achieved at somehow moderate resolution (Fischer et al., 
201 0). It is only a matter of resources and time before single-par- 
ticle cryo-EM is able to provide solution structures of molecules 
in multiple conformations at near-atomic resolution and to pro- 
vide quantitative comparisons of population occupancies under 
different conditions. 

Single-Particle Cryo-EM Is Complementary to X-Ray 
Crystallography 

There are many large protein assemblies and dynamic com- 
plexes that are difficult or may even be impossible to crystallize. 
Thus, single-particle cryo-EM has always been viewed as a sup- 
plementary method to X-ray crystallography for studying such 
assemblies or complexes, such as clathrin coats (Fotin et al., 
2004), the 26S proteasome (da Fonseca et al., 2012; Lander 
et al., 2012; Lasker et al., 2012), the anaphase promoting com- 
plex (Chang et al., 2014; da Fonseca et al., 2011), and chromatin 
fibers (Song et al., 2014), to name just a few. In these studies, 
structures were typically determined by single-particle methods 
to subnanometer resolution. Crystal structures of domains and 
fragments or sequence-based homology models were then 
fitted into the cryo-EM density maps by molecular dynamic sim- 
ulations or other computational methods (DiMaio et al., 2015; 
Seidelt et al., 2009; Trabuco et al., 2009; Zhao et al., 2013). 
Such hybrid approaches made, for example, subnanometer-res- 
olution structures of integral membrane proteins very meaningful 
in providing rich structural insights into large membrane protein 
complexes (Efremov et al., 201 5; Vinothkumar et al., 201 4) or for 
dissecting function-related conformational changes (Kim et al., 
2014; Meyerson et al., 2014). 

With the resolution improved to a level sufficient for sequence- 
based de novo model building, structures determined by single- 
particle cryo-EM are comparable to those determined from 
crystals (Bartesaghi et al., 2014; Li et al., 2013). Therefore, for 
many difficult crystallographic targets, either because they are 
refractory to crystallization or difficult to express and purify in 
sufficient quantities for crystallization, single-particle cryo-EM 
is becoming the method-of-choice for structure determination. 
Recent successes in structure determination of mammalian inte- 
gral membrane proteins clearly demonstrated this capability 
(Liao et al., 2013; Lu et al., 2014; Yan et al., 2015; Zalk et al., 
2015). Even for those targets that could be crystallized, it is 
now feasible to use single-particle cryo-EM to determine high- 
resolution structures of the targets in specific functional states 
or in complexes with co-factors. We can anticipate that such 
successes will continue rapidly, with many more structures of 
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various types of biological molecules being determined to near- 
atomic resolution. Besides large complexes such as ribosome 
and icosahedral viruses, integral membrane proteins or mem- 
brane protein complexes will be a major area in which single-par- 
ticle cryo-EM will play a role that is equally significant as X-ray 
crystallography. Another class of targets that is difficult to crys- 
tallize but suitable for single-particle cryo-EM is chromatin in 
complex with its modifiers, which is essential for understanding 
the complexities of gene expression. Progress of crystallo- 
graphic studies in this area has been slow with only a few atomic 
structures available for nucleosomes alone or in complex with 
modifiers, each having led to major discoveries in chromatin 
biology (Cramer, 2014). Recent work, although still limited to 
nanometer resolution, has shown the tremendous promise of 
single-particle cryo-EM in this important structural biology field 
(Song et al., 2014). 

Future Perspectives for Single-Particle Cryo-EM 

Without a doubt, single-particle cryo-EM is no longer “blob- 
ology” but is now a method that can provide resolutions compa- 
rable with X-ray crystallography. However, unlike X-ray crystal- 
lography, which often ends up with a binary result of either 
having or not having a diffracting crystal, single-particle cryo- 
EM always yields some information (although not always at 
atomic resolution). Even a reconstruction at a modest resolution 
provides information of how to improve the preparation as well 
as valuable biological insights. Thus, single-particle cryo-EM is 
probably even more attractive than X-ray crystallography in 
studying macromolecules. 

However, the technology of single-particle cryo-EM is still far 
from perfect and technological developments are still moving 
forward rapidly. The current resolution is still unsatisfactory in 
many ways. For example, extending the achievable resolution 
to beyond 3 A is necessary to convincingly visualize the location 
of ions, or to visualize not only where but also how small ligand 
molecules bind to target proteins. The latter is of particular inter- 
est for the pharmaceutical industry because it can facilitate 
structure-based drug design and optimization. A recent review 
discusses in detail the current technical limitations of single-par- 
ticle cryo-EM, particularly in achieving higher resolution, and 
possible solutions (Agard et al., 201 4). Related to insufficient res- 
olution, time spent on de novo model building and refinement is 
often far more than that used to determine the reconstruction it- 
self. While many tools from X-ray crystallography can be applied 
to cryo-EM density map-based model building and refinement, it 
requires significant modifications (Amunts et al., 2014; Brown 
et al., 2015). Also, the traditional validation criterion in X-ray 
crystallography, such as the free R-factor, is no longer valid for 
models built into cryo-EM density maps. Therefore, tools and 
methodologies for efficient model building, refinement, and vali- 
dation all need further developments. 

In addition to improving the technology itself, there are other 
factors that limit the wide application of single-particle cryo- 
EM. First, the method itself is not yet a “turnkey” method. 
Even with automated data acquisition technology and stream- 
lined data processing, image acquisition, and processing is still 
too complex for a novice to learn with minimal training or by 
studying manuals. Second, the needed infrastructure, including 



fully functional cryo-EM equipment and computational re- 
sources for data processing and storage, requires significant 
financial investment. In addition to the initial investment, the 
ongoing costs required to maintain and operate a high-end 
cryo-EM facility are significant. Third, there are currently too 
few synchrotron-like cryo-EM facilities dedicated for high- 
throughput cryo-EM data acquisition for the community at large. 
These limitations set the threshold for entering the field far too 
high, and improving access will require efforts from multiple 
parties. Therefore, making the technology robust and relatively 
easy to learn, reducing the equipment and operational costs, 
and providing access to ready-to-use facilities staffed with 
experts will all be important steps toward making cryo-EM 
as widely used as X-ray crystallography. While the future of 
single-particle cryo-EM is bright, it requires strong support 
from the scientific community as well as from funding agencies 
to make the single particle cryo-EM as popular as X-ray 
crystallography. 
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SUMMARY 

Mitochondrial diseases include a group of maternally 
inherited genetic disorders caused by mutations in 
mtDNA. In most of these patients, mutated mtDNA 
coexists with wild-type mtDNA, a situation known as 
mtDNA heteroplasmy. Here, we report on a strategy 
toward preventing germline transmission of mito- 
chondrial diseases by inducing mtDNA heteroplasmy 
shift through the selective elimination of mutated 
mtDNA. As a proof of concept, we took advantage of 
NZB/BALB heteroplasmic mice, which contain two 
mtDNA haplotypes, BALB and NZB, and selectively 
prevented their germline transmission using either 
mitochondria-targeted restriction endonucleases or 
TALENs. In addition, we successfully reduced human 
mutated mtDNA levels responsible for Leber’s hered- 
itary optic neuropathy (LHOND), and neurogenic mus- 
cle weakness, ataxia, and retinitis pigmentosa (NARP), 
in mammalian oocytes using mitochondria-targeted 
TALEN (mito-TALENs). Our approaches represent a 
potential therapeutic avenue for preventing the trans- 
generational transmission of human mitochondrial 
diseases caused by mutations in mtDNA. 

INTRODUCTION 

Mitochondria are double-membrane cellular organelles of 
bacterial origin that play fundamental roles in multiple cellular 

CrossMark 



processes including energy production, calcium homeostasis, 
cellular signaling, and apoptosis (Dyall et al., 2004). Mitochondria 
contain their own mtDNA encoding 1 3 polypeptides of the mito- 
chondrial respiratory chain as well as tRNAs and rRNAs neces- 
sary for their synthesis (Anderson et al., 1981). mtDNA is present 
in multiple copies per cell, ranging from approximately 1,000 
copies in somatic cells to several 100,000 copies in oocytes, 
with an average 1-10 copies per organelle (Shoubridge and 
Wai, 2007). In contrast to nuclear DNA, mtDNA is exclusively 
transmitted through maternal inheritance. Diseases resulting 
from mitochondrial dysfunction caused by mtDNA mutations 
affect 1 in 5,000 children (Haas et al., 2007), and it is estimated 
that 1 in 200 women could be a mitochondrial disease carrier. 
Due to the fundamental role of mitochondria in energy produc- 
tion, mitochondrial diseases correlate with degeneration of tis- 
sues and organs with high-energy demands. This leads to myop- 
athies, cardiomyopathies, and encephalopathies, among other 
phenotypes (Taylor and Turnbull, 2005). Currently, there is no 
cure for mitochondrial diseases. Genetic counseling and pre-im- 
plantation genetic diagnosis (PGD) represent the only therapeu- 
tic options for preventing transmission of mitochondrial diseases 
caused by mtDNA mutations. However, due to the non-Mende- 
lian segregation of mtDNA, PGD can only partially reduce the risk 
of transmitting the disease (Brown et al., 2006). Moreover, anal- 
ysis of multiple blastomeres may compromise embryo viability. 
Recently, mitochondrial replacement techniques by spindle, 
pronuclear, or polar body genome transfer into healthy enucle- 
ated donor oocytes or embryos have been reported (Craven 
et al., 2010; Pauli et al., 2013; Tachibana et al., 2013; Wang 
et al., 2014). Application of these techniques implies combining 
genetic material from three different individuals, which has 
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raised ethical, safety, and medical concerns (Hayden, 2013; 
Vogel, 2014). Therefore, alternative and complementary ap- 
proaches that alleviate or eliminate these concerns should be 
investigated when devising feasible clinical paths toward pre- 
venting the transmission of mitochondrial diseases caused by 
mtDNA mutations. 

Due to the thousands of copies of mtDNA contained within a 
cell, the levels of mutated mtDNA can vary. The term homo- 
plasmy refers to the presence of a single mtDNA haplotype 
in the cell, whereas heteroplasmy refers to the coexistence of 
more than one mtDNA haplotype. When the percentage of 
mutated mtDNA molecules exceeds a threshold that compro- 
mises mitochondrial function, a disease state may ensue (Tay- 
lor and Turnbull, 2005; Wallace and Chalkia, 2013). Threshold 
levels for biochemical and clinical defects are generally in the 
range of 60%-95% mutated mtDNA depending on the severity 
of the mutation (Russell and Turnbull, 2014). Changes in the 
relative levels of heteroplasmic mtDNA can be referred to as 
mtDNA heteroplasmy shifts. Despite the fact that mitochondria 
possess all the necessary machinery for homologous recombi- 
nation and non-homologous end joining, they do not seem to 
represent the major pathway for mtDNA repair in mammalian 
mitochondria (Alexeyev et al., 2013). Previous studies have 
demonstrated that the relative levels of mutated and wild- 
type mtDNA can be altered in patient somatic cells containing 
the m.8993T>G mtDNA mutation responsible for the NARP and 
MILS syndromes, where elimination of mutated mtDNA led to 
the restoration of normal mitochondrial function (Alexeyev 
et al., 2008). Similarly, using the heteroplasmic NZB/BALB 
mouse model that carries two different mtDNA haplotypes 
(NZB and BALB), BALB mtDNA, which contains a unique ApaLI 
site, has been specifically reduced in vivo using a mitochon- 
dria-targeted ApaLI (Bacman et al., 2012; 2010). Recently, 
transcription activator-like effector nucleases (TALENs) and 
zinc finger nucleases (ZFNs) targeted to mitochondria have 
being utilized for the specific elimination of mitochondrial ge- 
nomes carrying mutations responsible for mitochondrial dis- 
eases (Bacman et al., 2013; Gammage et al., 2014; Minczuk 
et al., 2006; 2008). These novel approaches allow for the 
targeting of a wider spectrum of mutations against which re- 
striction endonucleases could not be used. However, these 
approaches do not provide mechanisms for preventing the 
transmission of mutated mtDNA nor do they allow for a com- 
plete systemic clearance of mtDNA mutations in subsequent 
generations. 

Here, we report on the specific reduction of mitochondrial 
genomes in the germline for preventing transmission of mito- 
chondrial diseases. As a proof of concept, and by using the het- 
eroplasmic NZB/BALB mouse model, we specifically reduced 
BALB or NZB mitochondrial genomes in the germline using 
mitochondria-targeted restriction endonucleases and TALENs 
and prevented their transmission to the next generation. More- 
over, we successfully reduced mutated mitochondrial genomes 
responsible for human mitochondrial diseases in mouse oo- 
cytes using mitochondria-targeted nucleases. The approaches 
presented here may be applied and developed to prevent 
the transgenerational transmission of human mitochondrial 
diseases. 



RESULTS 

Specific Reduction of Mitochondrial Genomes in 
Oocytes and Embryos Using Restriction Endonucleases 

With the goal of establishing an alternative therapeutic approach 
for preventing the germline transmission of mitochondrial dis- 
eases caused by mtDNA mutations, we tested the specific elim- 
ination of BALB mtDNA in NZB/BALB oocytes and one-cell 
embryos. For this purpose, we generated a mammalian codon 
optimized ApaLI targeted to mitochondria by the ATP5B mito- 
chondria targeting sequence and the ATP5B 5' and 3' UTRs to 
promote co-translational import from mitochondrial associated 
ribosomes (Marc et al., 2002). An enhanced GFP (EGFP) reporter 
was also included in the construct to monitor expression (Fig- 
ure 1A). First, we tested the mitochondrial localization of the 
ApaLI protein generated from the construct by immunostaining 
in NZB/BALB tail tip fibroblasts (TTFs) and observed robust 
co-localization of mitochondria-targeted ApaLI (mito-ApaLI) 
with the mitochondrial dye Mitotracker (Figure SI A). In contrast, 
we failed to observe mitochondrial localization of non-mitochon- 
dria-targeted ApaLI (Figure SI A). Analysis of mtDNA by “last- 
cycle hot” PCR and restriction fragment length polymorphism 
(RFLP) demonstrated induction of heteroplasmy shift by specific 
reduction of BALB mtDNA in cells transfected with mito-ApaLI 
compared to control cells transfected with mito-GFP after 
72 hr (Figure SIB). In addition, we found normal mtDNA copy 
number in mito-ApaLI transfected cells, which resulted from 
the replication of the remaining NZB mtDNA that compensated 
for the reduction of BALB mtDNA (Figure SIC). 

We next decided to test whether a similar approach could be 
used in oocytes to specifically eliminate BALB mtDNA (Figure 1 A). 
First, we confirmed the mitochondrial localization of mito-ApaLI 
in NZB/BALB metaphase II (Mil) oocytes injected with mRNA en- 
coding mito-ApaLI by immunostaining (Figure IB). As expected, 
mito-ApaLI co-localized with Mitotracker in Mil oocytes (Fig- 
ure 1 B). RFLP analysis 48 hr after mito-ApaLI mRNA injection 
demonstrated the specific reduction of BALB mtDNA and a 
consequential increase in the relative NZB mtDNA levels (Fig- 
ure 1 C). In agreement with the lack of mtDNA replication in mature 
oocytes and pre-implantation embryos (Wai et al., 201 0), analysis 
of mtDNA copy number by qPCR revealed a decrease in mtDNA 
copy number following mito-ApaLI injection proportional to the 
initial levels of BALB mtDNA (Figure 1 D). To verify the reduction 
of BALB mtDNA, we performed RFLP and qPCR analyses by 
amplification of an independent region of the mtDNA containing 
a unique Hindlll site, exclusively present in BALB mtDNA. These 
analyses confirmed the specific reduction of BALB mtDNA upon 
injection of mito-ApaLI in NZB/BALB M 1 1 oocytes (Figure SI D and 
S1E). Injection of mito-ApaLI in BALB or NZB single haplotype 
oocytes resulted in complete depletion of mtDNA in BALB oo- 
cytes and did not affect mtDNA levels in NZB oocytes reinforcing 
the specificity of mito-ApaLI (Figure S1F). Collectively, these 
results suggest the potential of this approach for the specific 
reduction of mtDNA in the germline. 

In addition to oocytes, we tested whether mtDNA hetero- 
plasmy shift could be applied to one-cell embryos without 
affecting their normal development until the blastocyst stage 
(Figure 2A). For this purpose, NZB/BALB one-cell embryos 
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were injected with mito-ApaLI mRNA. Time-lapse fluorescent 
microscopy images revealed the expression of mito-ApaLI indi- 
cated by EGFP expression, and more importantly, normal devel- 
opment of mito-ApaLI-injected embryos through the different 
developmental stages analyzed (Figure 2B). Similarly to the re- 
sults observed in oocytes, RFLP analysis of mito-ApaLI blasto- 
cysts demonstrated specific reduction of BALB mtDNA and an 
increase in the relative levels of NZB mtDNA (Figure 2C). More- 
over, due to the lack of mtDNA replication until the blastocyst 
stage (Wai et al., 2010), analysis of mtDNA copy number by 
qPCR showed a decrease in mtDNA levels proportional to the 
BALB mtDNA levels (Figure 2D). RFLP and qPCR analyses at 
the Hindlll region confirmed the specific reduction of BALB 
mtDNA upon injection of mito-ApaLI in NZB/BALB embryos 
(Figures S2A and S2B). 

Preventing the Transmission of Mitochondrial Genomes 
Using Mitochondria-Targeted Restriction 
Endonucleases 

Next, we investigated whether induction of mtDNA heteroplasmy 
shift could be utilized for preventing the transmission of mito- 



Figure 1. Heteroplasmy Shift in NZB/BALB 
MM Oocytes Using mito-ApaLI 

(A) Injection of mito-ApaLI mRNA in oocytes for 
induction of heteroplasmy shift. 

(B) Mitochondrial co-localization of mito-GFP and 
mito-ApaLI with Mitotracker in injected oocytes by 
immunofluorescence. Scale bars, 10 ^im. 

(C) RFLP analysis and quantification of mtDNA 
heteroplasmy in control and mito-ApaLI injected 
Mil oocytes after 48 hr (Control n = 16; mito-ApaLI 
n = 12). Representative gel. 

(D) Quantification of mtDNA copy number by qPCR 
in control and mito-ApaLI-injected oocytes Mil 
after 48 hr (Control n = 12; mito-ApaLI n = 12). 
Error bars represent ± SEM. ****p < 0.0001. See 
also Figure SI . 



chondrial diseases to the next generation. 
NZB/BALB one-cell embryos injected 
with mito-ApaLI mRNA were cultured 
in vitro until the blastocyst stage and 
transferred to pseudopregnant mice (Fig- 
ure 3A). After a standard gestation period, 
pseudopregnant mice gave birth to live 
pups through natural delivery (Figure 3B). 
Most importantly, RFLP analysis of total 
DNA from FI mito-ApaLI animals re- 
vealed a significant reduction of BALB 
mtDNA (Figure 3C). Further analysis 
demonstrated reduction of BALB mtDNA 
in the brain, muscle, heart, and liver. 
These data indicate the systemic clear- 
ance of a specific mtDNA in the offspring 
of heteroplasmic mothers (Figure 3D). 
Similarly, analysis at the Hindlll region 
confirmed the specific reduction of 
BALB mtDNA in FI mito-ApaLI animals 
(Figures S3A and S3B). Furthermore, analysis of mtDNA copy 
number showed normal mtDNA levels resulting from NZB 
mtDNA replication upon embryo implantation (Figure 3E). 
Comprehensive characterization of mito-ApaLI animals, both 
males and females, showed normal development, weight gain 
(Figure 4A), complete blood count (Table SI) as well as normal 
blood levels of glucose and lactate, all potential indicators of 
mitochondrial dysfunction (Haas et al., 2007) (Figure 4B). More- 
over, typical behavioral studies indicative of CNS defects (Ross 
et al., 2013), including open field, rotor-rod, grip strength, and 
sensory neuron screening, showed normal performance of 
mito-ApaLI animals (Figures 4C-4E). 

To assess potential off-target effects on the nuclear genome, 
we performed comparative hybridization genomic (CHG) array 
and exome sequencing. CGH array indicated normal genomic 
integrity of mito-ApaLI animals (Figure S3C). Confirming this 
result, exome sequencing demonstrated variant rates in ApaLI 
containing exomic regions comparable to non-ApaLI exomic re- 
gions, excluding the possibility of off-target effects of mito-ApaLI 
(0.0014 versus 0.0047 variants per hundred base pairs, respec- 
tively). Furthermore, mito-ApaLI animals were fertile, and RFLP 
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Figure 2. Heteroplasmy Shift in NZB/BALB 
Embryos Using mito-ApaLI 

(A) Injection of mito-ApaLI mRNA in one-cell em- 
bryos for induction of heteroplasmy shift. 

(B) In vitro development of mito-ApaLI-injected 
embryos to blastocyst stage. Time-lapse images 
of EGFP reporter expression at different develop- 
mental stages. 

(C) RFLP analysis and quantification of mtDNA 
heteroplasmy in control and mito-ApaLI-injected 
embryos (Control n = 10; mito-ApaLI n = 8). 
Representative gel. 

(D) Quantification of mtDNA copy number by qPCR 
in control and mito-ApaLI-injected embryos 
(Control n = 18; mito-ApaLI n = 12). 

Error bars represent ± SEM. ***p < 0.001. ****p < 
0.0001 . See also Figure S2. 
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analyses showed barely detectable levels of BALB mtDNA in 
the F2 generation (Figures 4F and S4). These results confirm 
the feasibility of mtDNA heteroplasmy shift to prevent the trans- 
generational transmission of mitochondrial diseases. 

Preventing the Transmission of Mitochondrial Genomes 
Using Mito-TALENs 

Despite the broad range of over 200 mtDNA mutations associ- 
ated with mitochondrial diseases, only the human mutation 
m8993T>G responsible for two mitochondrial diseases: neuro- 
genic muscle weakness, ataxia, and retinitis pigmentosa 
(NARP) and maternally inherited Leigh syndrome (MILS) gener- 
ates a unique restriction site that can be targeted using the 
naturally occurring restriction endonuclease Xmal. For these rea- 
sons, alternative approaches to induce heteroplasmy shift based 
on the use of mitochondria-targeted transcription activator- 1 ike 
effector nucleases (TALENs) and zinc finger nucleases (ZFNs), 
which could be designed against virtually any mutation, have 
been recently developed by us and other groups (Bacman 
et al., 2013; Gammage et al., 2014; Minczuk et al., 2006; 



Control mito-ApaLI 



2008). In order to evaluate the use of 
mito-TALENs to prevent the transmission 
of mitochondrial diseases, we tested the 
specific elimination of NZB mtDNA in 
NZB/BALB oocytes. For this purpose, 
we first generated a collection of TALENs 
against NZB mtDNA and screened for a 
TALEN with the highest specificity against 
NZB mtDNA (Figures S5A-S5C). Under 
our design, the left monomer of the 
TALEN will bind to the common sequence 
of NZB and BALB mtDNA while the right 
monomer will preferentially recognize 
and bind to NZB mtDNA, dictating the 
specific cleavage of NZB mtDNA upon 
dimerization of the FokI nuclease (Fig- 
ure S5A). NZB TALEN monomers were 
targeted to mitochondria by the human 
ATP5B and SOD2 mitochondria targeting 
sequence and the ATP5B and SOD2 5' 
and 3' UTRs to promote co-translational import from mitochon- 
drial associated ribosomes (Marc et al., 2002). In addition, an 
EGFP or mCherry reporter was also included in the constructs 
encoding each TALEN monomer (Figure 5A). Once again, we 
tested the mitochondrial localization of the NZB TALEN by 
immunostaining in NZB/BALB tail tip fibroblasts (TTFs) and 
observed robust co-localization of mitochondria-targeted 
NZB TALEN monomers (hereafter NZB mito-TALEN) with the 
Mitotracker (Figure S5D). Analysis of mtDNA by RFLP demon- 
strated induction of heteroplasmy shift in NZB/BALB cells by a 
specific reduction of NZB mtDNA after 72 hr in cells transfected 
with NZB mito-TALENs compared to control cells transfected 
with mito-GFP (Figure S5E). In addition, similar to mito-ApaLI, 
we found normal mtDNA copy number in NZB mito-TALEN 
transfected cells resulting from the replication of the remaining 
BALB mtDNA that compensated for the reduction of NZB 
mtDNA (Figure S5F). 

We next decided to test whether mito-TALENs could be used 
in oocytes to specifically eliminate NZB mtDNA (Figure 5A). Fluo- 
rescent microscopy images revealed the expression of both 
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NZB mito-TALEN monomers as indicated by EGFP and mCherry 
expression in oocytes (Figure 5B). RFLP analysis 48 hr after 
NZB mito-TALEN mRNA injection demonstrated the specific 
decrease of NZB mtDNA and a consequential increase in the 
relative BALB mtDNA levels (Figure 5C). RFLP analysis at the 
Hindlll region confirmed the specific reduction of NZB mtDNA 
upon injection of NZB mito-TALEN in NZB/BALB Mil oocytes 
(Figure S5G). Analysis of mtDNA copy number by qPCR revealed 
a decrease in mtDNA copy number following NZB mito-TALEN 
injection in oocytes in agreement with the lack of mtDNA replica- 
tion in oocytes (Figure 5D). These results demonstrate the poten- 
tial of custom designed mito-TALENs for the specific elimination 
of mitochondrial genomes in the germline aimed at preventing 
the transmission of mitochondrial diseases. 

Specific Reduction of Human Mutated Mitochondrial 
Genomes Responsible for Mitochondrial Diseases in 
Mammalian Oocytes 

In order to evaluate the potential of our approach to prevent the 
transmission of human mitochondrial diseases we decided to 
test the use of mitochondria-targeted nucleases against mutated 
mitochondrial genomes responsible for two mitochondrial 
diseases: Leber’s hereditary optic neuropathy and dystonia 



Figure 3. Generation of Live Animals after 
Induction of Heteroplasmy Shift in NZB/ 
BALB Embryos Using mito-ApaLI 

(A) Outline for the generation of live animals after 
injection of mito-ApaLI mRNA in one-cell embryos. 

(B) Representative photograph of F1 mito-ApaLI 
mice. 

(C) RFLP analysis and quantification of mtDNA 
heteroplasmy in tail tip biopsies of embryo donors 
and generated F1 mito-ApaLI pups. (Donor n = 10; 
mito-ApaLI n = 9). 

(D) RFLP analysis and quantification of mtDNA 
heteroplasmy in tail, brain, muscle, heart, and liver 
of FI mito-ApaLI mice. 

(E) Quantification of mtDNA copy number by qPCR 
in FI mito-ApaLI pups (Donor n = 10; FI mito- 
ApaLI n = 9). 

Error bars represent ± SEM. ****p < 0.0001. See 
also Figure S3. 



(LHOND) and NARP (Jun et al., 1994; Tay- 
lor and Turnbull, 2005). Due to the limited 
number of available patients and the diffi- 
culty in obtaining oocytes from these pa- 
tients, we generated artificial mammalian 
oocytes carrying mutated genomes by 
cellular fusion of patient cells and mouse 
oocytes using Sendai virus (Figure 6A). 
Although this model has limitations 
compared to patient oocytes, it helped 
us to test the potential of our methodology 
for the specific elimination of pathogenic 
human mtDNAs in mammalian oocytes. 

For this purpose, we first tested the 
fusion of 143B osteosarcoma cybrid cells 
harboring the LHOND m.14459G>A mutation to mouse MM oo- 
cytes (Figure 6B). After 3 hr, complete fusion was observed 
and no individual cells were detected under the zona pellucida 
of oocytes (Figure 6B). LHOND-fused oocytes were incubated 
for 48 hr and collected for analysis. PCR analysis using primers 
specific against the human mtDNA region containing the LHOND 
m.14459G>A mutation allowed for the detection of LHOND 
mtDNA in fused oocytes (Figure S6A). Next, we tested whether 
the LHOND mito-TALEN that we have recently reported could 
be used for the specific elimination of LHOND mtDNA in oocytes 
(Bacman et al., 2013). For this purpose. Mil oocytes harboring 
LHOND mtDNA were injected with mRNA encoding the LHOND 
mito-TALEN 3 hr after cell fusion. Fluorescent microscopy 
images revealed the expression of both LHOND mito-TALEN 
monomers as indicated by EGFP and mCherry expression (Fig- 
ure S6B). RFLP analysis 48 hr after mRNA injection demon- 
strated the specific reduction of LHOND mtDNA in fused oocytes 
(Figure 60). Analysis of mtDNA copy number by qPCR confirmed 
a significant reduction of human mutated LHOND mtDNA upon 
injection of LHOND mito-TALENs in fused oocytes (Figure 6D). 
Finally, to demonstrate the potential of this approach against 
other mitochondrial diseases we decided to use a similar strat- 
egy to test the elimination of human mitochondrial genomes 
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Figure 4. Characterization of F1 mito-ApaLI Mice 

(A) Body weight of mito-ApaLI males (Control n = 5 and mito-ApaLI n = 3) and mito-ApaLI females (Control n = 5 and mito-ApaLI n = 6) at different time points, ns, 
non-significant. 

(B) Biochemical analysis of glucose and lactate in blood of control (n = 10) and mito-ApaLI (n = 9) mice, ns, non-significant. 

(C) Open field test measuring baseline levels of locomotor activity in freely moving mice quantifying distance traveled, ambulatory counts, and vertical counts. 

(D) Rotarod test evaluating locomotor coordination based on the latency at which a fall occurs on a gradually accelerating spinning rod. 

(E) Grip strength test measuring average and maximum grip force in the forelimbs. 

(F) RFLP analysis and quantification of mtDNA heteroplasmy in tail tip biopsies of F2 mito-ApaLI pups. (F2 mito-ApaLI n = 12). 

Error bars represent ± SEM. See also Figure S4 and Table SI. 
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Figure 5. Heteroplasmy Shift in NZB/BALB 
MM Oocytes Using NZB Mito-TALEN 

(A) Injection of NZB mito-TALEN mRNA in oocytes 
for induction of heteroplasmy shift. 

(B) Expression of fluorescent reporters of NZB 
TALEN monomer in Mil oocytes. 

(C) RFLP analysis and quantification of mtDNA 
heteroplasmy in control and NZB TALEN-injected 
oocytes after 48 hr (Control n = 9; NZB TALEN 
n = 7). Representative gel. 

(D) Quantification of mtDNA copy number by qPCR 
in control and NZB TALEN-injected oocytes after 
48 hr (Control n = 16; NZB TALEN n = 8). 

Error bars represent ± SEM. **p < 0.01. ***p < 
0.001 . See also Figure S5. 
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carrying the mutation NARP m.9176T>C. For this purpose, we 
first generated a collection of TALENs against NARP mtDNA 
and screened for a TALEN with the highest specificity against 
the mutation NARP m.9176T>C (Figures S6C-S6E). NARP 
mito-TALEN monomers were targeted to mitochondria by the 
ATP5B and SOD2 mitochondria targeting sequence and the 
ATP5B and SOD2 5' and 3' UTRs (Figure 6A). Immunostaining 
in NARP patient cells revealed a robust co-localization of mito- 
chondria-targeted NARP mito-TALEN monomers with the mito- 
chondrial dye Mitotracker (Figure S6F). Subsequently, we tested 
the induction of heteroplasmy shift by NARP mito-TALEN using 
immortalized NARP patient cells. Analysis of mtDNA by RFLP 
demonstrated induction of heteroplasmy shift in NARP cells 
with a reduction in NARP mtDNA after 72 hr in cells transfected 
with the NARP mito-TALEN compared to cells transfected with 
mito-GFP (Figure S6G). In addition, we found normal mtDNA 
copy numbers in NARP mito-TALEN transfected cells resulting 
from the replication of the remaining mtDNA (Figure S6H). 
Next, similar to LHOND, we tested the specific elimination of 
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NARP mitochondrial genomes in oocytes. 
As before, patient cells harboring the 
NARP m.9176T>C mutation were fused 
to MM oocytes using Sendai virus and in- 
jected with NARP mito-TALEN 3 hr after 
fusion. Fluorescent reporters for both 
NARP mito-TALEN monomers were 
observed in oocytes as indicated by 
EGFP and mCherry expression (Fig- 
ure S6I). RFLP analysis 48 hr after 
mRNA injection demonstrated the spe- 
cific reduction of NARP mtDNA in fused 
oocytes (Figure 6E). Analysis of mtDNA 
copy number by qPCR confirmed a signif- 
icant reduction of human mutated NARP 
mtDNA upon injection of NARP mito- 
TALENs in fused oocytes (Figure 6F). We 
speculate that the low levels of wild-type 
mtDNA carried by the NARP patient cells, 
together with the lack of mtDNA replica- 
tion in oocytes, might be the reason why 
we fail to detect a significant increase in 
wild-type human mtDNA upon NARP 
mito-TALEN injection. Collectively, these results confirm the po- 
tential of custom-designed mito-TALENs for the specific elimina- 
tion of clinically relevant mutated mitochondrial genomes 
responsible for human mitochondrial diseases in the germline. 

DISCUSSION 

In summary, we report here on novel strategies for preventing 
germline transmission of mitochondrial diseases through the in- 
duction of mtDNA heteroplasmy shift in oocytes and embryos. 
As a proof of concept, we used a heteroplasmic mouse model car- 
rying two different mtDNA haplotypes: NZB and BALB. First, we 
demonstrated that injection of mRNA encoding mitochondria-tar- 
geted ApaLI restriction enzyme into oocytes, as well as into one- 
cell embryos, led to the generation of live animals with significantly 
reduced levels of the BALB mtDNA haplotype. These animals dis- 
played normal behavior, development, gross genomic integrity 
and fertility. Moreover, their progeny (F2 generation) maintained 
significantly reduced levels of BALB mtDNA. These results 
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Figure 6. Specific Elimination of Human 
LHOND m.14459G>A and NARP m.9176T>C 
Mutations in Mammalian Oocytes Using 
Mito-TALENs 

(A) Fusion of human cells harboring LHOND 
m.14459G>A and NARP m.9176T>C mutations 
with mouse Mil oocytes followed by the injection of 
mito-TALENs for induction of heteroplasmy shift. 

(B) Representative images of Mil oocytes before 
and after cell fusion. 

(C) RFLP analysis and quantification of LHOND 
heteroplasmy in individual Mil oocytes with and 
without LHOND TALEN injection after 48 hr (Fusion 
n = 3; Fusion + TALEN n = 3). 

(D) Quantification of human mtDNA copy number 
by qPCR in individual Mil oocytes with and without 
LHOND TALEN injection after 48 hr (Fusion n = 4; 
Fusion + TALEN n = 4). 

(E) RFLP analysis and quantification of NARP 
heteroplasmy in individual Mil oocytes with and 
without NARP TALEN injection after 48 hr (Fusion 
n = 7; Fusion + TALEN n = 3). 

(F) Quantification of human mtDNA copy number 
by qPCR in individual Mil oocytes with and without 
NARP TALEN injection after 48 hr (Fusion n = 17; 
Fusion + TALEN n = 9). 

Error bars represent + SEM. *p < 0.05. ***p < 0.001 . 
See also Figure S6. 



12 3 12 3 

- WT 
] Mut. 

% Mut. mtDNA: 96.8 96.5 97.9 82.2 53.6 78.9 




>^ 6000- 
Q. 

8 

< 4000- 

z 

Q 

2000 - 



0 - 



demonstrate the potential of germline heteroplasmy shift to pre- 
vent the transgenerational transmission of mitochondrial ge- 
nomes. In addition, injection of mRNA encoding mitochondria-tar- 
geted NZB mito-TALEN into oocytes led to a significant reduction 
of NZB mtDNA levels. Finally, fusion of human patient cells car- 
rying mtDNA mutations to mouse oocytes followed by injection 
of mito-TALENs against these mutations demonstrated a specific 
reduction in the levels of mutated mtDNA. 

The use of restriction nucleases for the induction of hetero- 
plasmy shift has been previously demonstrated in the NZB/ 
BALB mouse as well as in patient somatic cells by us and other 
groups (Alexeyev et al., 2008; Bacman et al., 2010; 2012). How- 
ever, the application of restriction enzymes to target clinically 
relevant mutations is limited to only m8993T>G, which is respon- 
sible for some cases of NARP and MILS, a mutation that gener- 
ates a unique restriction site that can be targeted using the 
restriction endonuclease Xmal (Alexeyev et al., 2008). The use 
of other approaches using different types of nucleases including 
TALENs might allow for the custom-designed targeting of a wider 
range of human mitochondrial mutations responsible for mito- 



chondrial diseases. Along this line, several 
reports have recently demonstrated the 
use of mitochondria-targeted TALENs 
and zinc finger nucleases (ZFNs) for the 
specific elimination of mutated mitochon- 
drial genomes in somatic cells (Bacman 
et al., 2013; Gammage et al., 2014; Min- 
czuk et al., 2006; 2008). When compared 
to mitochondria-targeted restriction en- 
donucleases, the use of mito-TALENs for 
preventing transmission of mitochondrial diseases in the germ- 
line may be less robust. However, we speculate that their thera- 
peutic use will achieve specific reduction of mutated mitochon- 
drial genomes below the threshold levels (60%-95%) required 
for biochemical and clinical defects to manifest (Russell and 
Turnbull, 201 4). In addition, we anticipate that the future develop- 
ment and application of more specific and efficient gene editing 
technologies will allow for a greater reduction of mutated mtDNA 
levels in the germline. 

Transmission of mitochondrial diseases by female carriers 
directly correlates with the levels of mutated mtDNA present in 
oocytes. In many cases, asymptomatic female carriers with inter- 
mediate levels of mutant load may produce oocytes with different 
ranges of mutated mtDNA (Chinnery et al., 2000; Cree et al., 
2009). Due to the lack of mtDNA replication in oocytes and pre- 
implantation embryos, targeting of mutated mtDNA in oocytes 
with high mutant loads using the approach presented here may 
lead to a dramatic reduction in mtDNA copy number. In mice, em- 
bryos with mtDNA levels below a specific threshold develop nor- 
mally during the pre-implantation stages but subsequently fail to 
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implant in the uterus or undergo development arrest (Wai et al., 
2010). Consequently, oocytes containing high levels of mutated 
mtDNA that are subjected to heteroplasmy shift may result in em- 
bryos with low mtDNA copy number that may fail to implant in the 
uterine wall. In this case, though heteroplasmy shift may not result 
in a viable embryo, it would attain the goal of hampering the 
development and implantation of embryos with high mutant 
loads, thereby preventing the transmission of mitochondrial dis- 
eases to the next generation. In this scenario, PGD could be used 
as a complementary approach to select embryos with mtDNA 
copy numbers sufficient for implantation. 

Due to the non-Mendelian segregation of mtDNA, current 
therapeutic approaches, including genetic counseling and 
PGD, can only partially reduce, but not eliminate, the risk of 
transmission of mitochondrial diseases (Brown et al., 2006). 
The recent development of mitochondrial replacement tech- 
niques based on spindle, pronuclear, or polar body transfer 
into healthy enucleated donor oocytes or embryos, soon to 
be allowed in the UK and currently under review by US regu- 
latory agencies, represent a valid and powerful alternative 
to current approaches (Craven et al., 2010; Pauli et al., 2013; 
Tachibana et al., 2013; Wang et al., 2014). Mitochondrial 
replacement techniques involve a series of complex technical 
manipulations of nuclear genome between patient and donor 
oocytes that will result in the generation of embryos carrying 
genetic material from three different origins. For these 
reasons, mitochondrial replacement techniques have raised 
biological, medical, and ethical concerns (Hayden, 2013; Rein- 
hardt et al., 2013). Despite their great potential, more studies 
are still required to show that these techniques are safe in hu- 
man oocytes. The approach presented here relies on a single 
injection of mRNA into patient oocytes, which is technically 
simpler and less traumatic to the oocyte compared to mito- 
chondrial replacement techniques (Craven et al., 2010; Pauli 
et al., 2013; Tachibana et al., 2013; Wang et al., 2014). Impor- 
tantly, it does not require healthy donor oocytes, thus avoiding 
ethical issues related to the presence of donor mtDNA. 

Induction of mtDNA heteroplasmy shift using restriction endo- 
nucleases or TALENs has the potential to eliminate mutated 
mitochondrial genomes in the germline, and consequently, pre- 
vent the transgenerational transmission of mitochondrial dis- 
eases. In addition, since mtDNA mutations in the germline 
have been recently linked to aging (Ross et al., 2013), this strat- 
egy could also be applied to prevent the transmission of mtDNA 
variants with potential roles in aggravating aspects of human ag- 
ing and age-associated diseases. 

EXPERIMENTAL PROCEDURES 
Plasmids 

A synthetic gene coding for the ApaLi restriction endonuciease with a C-termi- 
nai HA (Hemaggiutinin antigen) tag was purchased from integrated DNATech- 
noiogies (Coraiviiie) with codon usage optimized for mammaiian transiation. 
For the generation of the mito-ApaLi construct, ApaLi was subcioned into 
the pVAX piasmid containing the mitochondria iocaiization signai derived 
from ATP5B, a unique Fiag immunotag in the N terminus, 5' and 3' UTR from 
ATP5B to iocaiize the mRNA to ribosomes associated with mitochondria, an 
independent fluorescent marker to select for expression (enhance GFP 
[EGFP]) and a recoded picornaviral 2A-like sequence (T2A') between the 



mito-ApaLI and the fluorescent marker. Subsequently, the fragment described 
was subcloned into the pcDNAS plasmid containing a T7 promoter for in vitro 
transcription. For the generation of the mito-GFP construct, EGFP was subcl- 
oned into the previously described pVAX construct lacking the independent 
fluorescent marker and the recoded picornaviral 2A-like sequence (T2A') but 
containing a T7 promoter. For the generation of ApaLi construct, ApaLi RE 
was subcloned into the previously described pVAX plasmid lacking the N ter- 
minus mitochondria localization signal derived from ATP5B and the 5' and 3' 
UTRs from ATP5B with a T7 promoter. Cloning was done using the In-Fusion 
HD cloning kit (Clontech Laboratories). 

Construction of Mito-TALENs 

TALEN target sites for NZB and NARP m.9176T>C were identified using the 
TAL effector-Nucleotide Targeter (TALE-NT) software (Christian et al., 2010). 
To increase TALEN specificity, TALEN with targeting sequences of various 
lengths ranging from 7.5 to 1 3.5 base pairs were designed. TALENs were con- 
structed into the TALEN cloning vector of the TALE Toolbox kit from Addgene 
(cat#1 00000001 9) (Sanjana et al., 2012), and the TALENs recognizing the 
target sites were constructed using the Golden Gate Assembly method. 
Mito-TALEN, were constructed by addition of mitochondria localization 
signals derived from ATP5B or SOD2 mitochondria localization signal, inclu- 
sion of a unique immuno-tag in the N terminus of the mature protein (hemag- 
glutinin [HA] or Flag), inclusion of the 5' and 3' UTRs from ATP5B or SOD2, 
inclusion of an independent fluorescent marker to select for expression 
(EGFP in one monomer and mCherry in the other) and inclusion of a recoded 
picornaviral 2A-like sequence (T2A') between the mito-TALEN and the fluores- 
cent marker. 

Animals 

All animal procedures were performed according to NIH guidelines and 
approved by the Committee on Animal Care at Salk Institute. NZB/BALB het- 
eroplasmic founder females were originally generated (Jenuth et al., 1996). 
NZB/BALB colony was maintained by breeding the females with BALB/cByJ 
males. Tail tip genotyping was routinely performed in order to exclude females 
carrying low levels of one of the two mtDNA haplotypes. BALB/c, BALB/cByJ 
and NZB mice were obtained from Jackson laboratory. 

Cells, Transfection, and Sorting 

Simian virus 40 (SV40) immortalized NZB/BALB fibroblasts containing NZB and 
BALB mtDNA were derived from tail tip of NZB/BALB mice. Human patient cells 
harboring the NARP m.9176T>C mutation were obtained by skin biopsy after 
signed informed consent of the donor and with the approval of the Institutional 
Review Board of the Hospital Clinic, Spain. Cells were immortalized using SV40 
and cultured at 37 °C in DMEM (Invitrogen) containing GlutaMAX, non-essen- 
tial amino acids and 1 0% fetal bovine serum (FBS). 1 43B osteosarcoma cybrid 
cells harboring the LHOND m.14459G>A mutation were obtained and cultured 
as previously described (Bacman et al. , 201 3). Cells were transfected with Lip- 
ofectamine 2000 (Invitrogen) according to the manufacturer’s instructions. Af- 
ter 72 hr, cells were sorted using a BD Influx (Becton, Dickinson and Company) 
by gating on single-cell fluorescence using a 488-nm laser with a 505LP, 530/40 
filter set for EGFP and a 561 -nm laser with a 600LP, 610/20 filter set for 
mCherry. Total DNA was extracted from sorted cells using the DNeasy Blood 
and Tissue Kit (QIAGEN) following the protocol suggested by the manufacturer. 

Single Strand Annealing Reporter Assay 

Please refer to Extended Experimental Procedures. 

Production of mRNA 

In vitro transcription of mRNA was performed using mMESSAGE mMACHINE 
T7 ULTRA kit (Life Technologies) according to the manufacturer’s instructions 
using linearized and gel purified (QIAGEN) plasmid template. The mRNA was 
purified using MEGAclear kit (Life Technologies) and quantified using Nano- 
Drop 8000 (Thermo Scientific). 

Oocyte Collection and mRNA Injection 

Female mice were superovulated with pregnant mares serum gonadotropin 
(PMSG) and human chorionic gonadotropin (hCG). Mil oocytes were 
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collected 14 hr after hCG injection in M2 medium (Millipore) and freed of 
cumulus cells using hyaluronidase. For collection of 1-cell embryos, supero- 
vulated female mice were mated to BALB/c males and fertilized embryos 
were collected 18-20 hr after hCG injection from oviduct. mRNA (50- 
250 ng/|al) was injected into the cytoplasm of Mil oocytes and fertilized 
embryos in M2 medium using Eppendorf micromanipulator. The injected 
Mil oocytes were in vitro cultured in KSOM (Millipore) for 48 hr before anal- 
ysis. The injected embryos were cultured in KSOM at 37°C under 5% CO 2 
in air until blastocyst stage. Subsequently, blastocysts were collected for 
analysis or transferred to BALB/c pseudopregnant females. Live pups were 
obtained by natural delivery. 

Cell Fusion 

Cell fusion was achieved by using inactivated Sendai virus (GenomeOne, 
Cosmo Bio). Sendai virus stock solutions were prepared according to 
the manufacturer instructions and further diluted 1:20 in cell fusion buffer. 
The 143B osteosarcoma cybrid cells harboring LHON m.14459G>A 
mutation and patient cells harboring NARP m9176T>C mutation were used 
for fusion with mature Mil oocytes. Cells were cultured for 48 hr in DMEM no 
glucose medium supplemented with galactose before using for cell fusion to 
increase mtDNA content. Cn the day of fusion, cells were trypsinized and re- 
suspended in M2 medium. For each Mil oocyte, five cells briefly placed in Sen- 
dai virus were injected under the zona pellucida. After 3 hr successfully fused 
oocytes were selected for mito-TALEN mRNA injection. Lastly, surviving oo- 
cytes were cultured in KSCM for 48 hr before analysis. 

Immunofluorescence 

Cells were seeded on coverslips before transfection. Forty-eight hours after 
transfection cells were incubated in the presence of 350 nM Mitotracker (Invitro- 
gen) for 30 min. Subsequently, cells were fixed and permeabilized with 4% PFA 
and 0.1 % T riton X-1 00, respectively. After fixation, cells were blocked for 1 hr at 
room temperature with 1 % BSA/PBS. Next, cells were incubated with an anti- 
Flag M2 primary antibody (Sigma) or anti-HA antibody (Millipore) overnight at 
4°C. The next day, cells were washed three times with PBS and incubated for 
1 hr at room temperature with Alexa Fluor 488-conjugated donkey antibodies 
to goat IgG (Molecular Probes) or Alexa Fluor 647-conjugated donkey antibodies 
to mouse IgG and 10 min with Hoechst 33342 (0.5 |ag ml“^ in PBS) (Invitrogen). 
Finally, cells were washed three times with PBS and mounted using Fluoro- 
mount-G (Southernbiotech). Confocal image acquisition was performed using 
a Zeiss LSM 780 laser-scanning microscope (Carl Zeiss Jena). 

“Last-Cycle Hot” PCR and RFLP 

Total DNA from cells, tail biopsies, and oocytes/embryos were used to 
determine mtDNA heteroplasmy by “Last-cycle hot” PCR using the mtDNA 
5' Fluorescein amidite (FAM) labeled primers as listed in Table S2. NZB/ 
BALB PCR products were digested with ApaLI or Hindlll, which digests 
BALB mtDNA at positions 5461 (ApaLI targeting site) and 9136 respectively. 
NARP PCR products were digested with BsrI which digest mutated 
NARP mtDNA at position 9176. The levels of LHCN m.14459G>A were 
determined as previously reported (Bacman et al., 2013). Digested PCR 
products were subjected to electrophoresis in an 12% polyacrylamide 
gel. The fluorescein signal was quantified using a Typhoon 8600 system 
(Molecular Dynamics) and gels were quantified using ImageOuant 5.2 
(Molecular Dynamics). 

Quantification of mtDNA Copy Number 

Absolute mtDNA copy numbers were quantified by real-time PCR using 
iOSyber Green on Bio-Rad iCycler (Bio-Rad). Individual oocytes and em- 
bryos were transferred into lysis buffer (200 mM KCH) and incubated for 
10 min at 65°C. The reaction was neutralized by addition of 200 mM HCI. 
Absolute mtDNA copy number per 1 ^il of lysate was calculated using a stan- 
dard curve derived from the 0-PCR amplification of a fragment of mtDNA 
genome. First, a standard curve was generated by a 10-fold serial dilution 
of a PCR product obtained using Standard curve primers for the different 
regions of mtDNA analyzed. Subsequently, to quantify the absolute levels 
of mtDNA, quantitative real-time PCR was performed using qPCR primers 
listed in Table S2. 



Blood and Plasma Parameters 

Blood collection was performed by sub-mandibular bleeding. Whole EDTA blood 
samples were analyzed in duplicates for Complete Blood Count (CBC) on 
a Hemavet 950FS Multi Species Hematology System (Drew Scientific). Plasma 
glucose concentration was determined using the Glucose (GC) Assay Kit (Sigma) 
according to the manufacturer’s instructions. Plasma lactate concentration was 
determined using the Lactate Assay Kit (Sigma) according to the manufacturer’s 
instructions. Please refer to Extended Experimental Procedures. 

Behavioral Analysis 

Behavioral testing was carried out at the Salk Institute for Biological Studies 
Behavioral Testing Core. Basic sensorimotor function was assessed in the 
Cpen Field Test, Rotarod, Grip Strength, and Neurological Screen. Please 
refer to Extended Experimental Procedures. 

Array Comparative Genomic Hybridization 

aCGH was performed following Agilent Cligonucleotide Array-Based CGH for 
Genomic DNA Analysis (Agilent Technologies, Santa Clara, CA). Please refer to 
Extended Experimental Procedures. 

Exome Capture and High-Throughput Sequencing 

Exome capture was using the SeqCap EZ Mouse Exome Design probe pool 
(54 Mb, NimbleGen) according to the manufacturer’s protocol. Please refer 
to Extended Experimental Procedures. 

Statistical Evaluation 

Statistical analyses were performed by using standard unpaired Student’s t 
test with Welch’s correction using Prism 6 software (GraphPad). All data are 
presented as mean + SEM and represent a minimum of two independent 
experiments. Statistical significance is displayed as *p < 0.05, **p < 0.01, 
***p< 0.001, and ****p < 0.0001 . 

ACCESSION NUMBERS 

The GEC database accession number for the aCGH data sets reported in this 
paper is GSE67371. The GEC accession number for the exome sequencing 
data sets reported in this paper is SRP056327. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, six 
figures, and two tables and can be found with this article online at http://dx. 
doi.org/1 0.101 6/j.cell.201 5.03.051 . 
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SUMMARY 

HIV-1 -neutralizing antibodies develop in most HIV-1 - 
infected individuals, although highly effective anti- 
bodies are generally observed only after years of 
chronic infection. Here, we characterize the rate of 
maturation and extent of diversity for the lineage 
that produced the broadly neutralizing antibody 
VRC01 through longitudinal sampling of peripheral 
B cell transcripts over 15 years and co-crystal 
structures of lineage members. Next-generation 
sequencing identified VRC01 -lineage transcripts, 
which encompassed diverse antibodies organized 
into distinct phylogenetic clades. Prevalent clades 
maintained characteristic features of antigen recog- 
nition, though each evolved binding loops and disul- 
fides that formed distinct recognition surfaces. Over 
the course of the study period, VRC01 -lineage clades 
showed continuous evolution, with rates of sub- 
stitutions per 100 nucleotides per year, comparable 
to that of HIV-1 evolution. This high rate of antibody 
evolution provides a mechanism by which antibody 
lineages can achieve extraordinary diversity and, 
over years of chronic infection, develop effective 
HIV-1 neutralization. 

INTRODUCTION 

HIV-1 -neutralizing antibodies to autologous virus develop within 
weeks or months of infection (Albert et al., 1990; Richman et al., 
2003; Wei et al., 2003), with serum neutralization generally 
evolving over time to gain increased potency and breadth (Gray 
et al., 2011; Mikell et al., 2011). In a study of ~200 chronically 
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HIV-1 -infected individuals, serum from ~50% of studied donors 
contained antibodies capable of neutralizing ~50% of HIV-1 
strains (Hraber et al., 2014). A subset of these individuals de- 
velops potent and broadly reactive antibodies (Gray et al., 
201 1 ; Li et al., 2007; Mikell et al., 201 1 ; Walker et al., 201 0; Wu 
et al., 2006). Isolation and characterization reveal these broadly 
neutralizing antibodies to have one or more unusual features 
including long or protruding heavy-chain third complemen- 
tarity-determining regions (CDR H3s) (Burton et al., 1994; Walker 
et al., 2009; Zhou et al., 2007), domain swapping (Calarese et al., 
2003), unusual post-translational modifications such as tyrosine 
sulfation (TyrS) (Huang et al., 2004; Pancera et al., 2010; Pejchal 
et al., 201 0), poly or autoreactivity (Haynes et al., 2005), extensive 
somatic hypermutation (SHM) (Scheid et al., 201 1 ; Walker et al., 
201 1 ; Wu et al., 201 0, 201 1 ), or dependence on framework region 
contacts (Klein et al., 2013). These characteristics highlight the 
extensive maturation process necessary for most antibodies to 
achieve effective HIV-1 neutralization (Burton et al., 2005; Klein 
et al., 2013; Mascola and Haynes, 2013; Scheid et al., 2009). 

To understand how effective neutralization develops, we and 
others have investigated the ontogenies of neutralizing antibody 
lineages using monoclonal antibodies (mAbs) isolated from 
individual B cells (Bonsignori et al., 2012; Scheid et al., 2011; 
Walker et al., 2011), next-generation sequencing (NGS) of 
cross-sectional samples (Wu et al., 2011; Zhou et al., 2013; 
Zhu et al., 2012, 2013), and NGS of longitudinal samples studied 
from time of infection (Doha-Rose et al., 2014; Liao et al., 2013). 
While these studies often focused on revealing the unmutated 
common ancestors (UCAs) of neutralizing antibody lineages 
and early antibody maturation, questions remain about the 
long-term and continuing development of antibody lineages. 
What is the scope of B cell development in a broadly neutralizing 
antibody lineage? What are the rates, compositions, extents, 
and continuities of lineage evolution? What biological mecha- 
nisms underlie the development of HIV-1 neutralization? 
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Here, we investigate the VRC01 -antibody lineage, which tar- 
gets the site of CD4 engagement on HIV-1 (Wu et al., 2010; 
Zhou et al., 2010) and is a member of a class of antibodies (the 
VRC01 class), which share similar structural and genetic charac- 
teristics (West et al., 201 2; Wu et al., 201 1 ; Zhou et al., 201 3). We 
identified dozens of VRC01 -lineage antibodies from one donor 
and describe characteristics of this lineage as it evolved over 
the course of 1 5 years. Overall, these results delineate the scope 
of evolutionary diversity for a persistent antibody lineage. Anti- 
body lineage characteristics identified here— such as multi-clade 
divergence and a high rate of evolution — may be common to 
effective HIV-1 -neutralizing antibodies and provide insights into 
the immunological mechanisms that enable their development. 

RESULTS 

39 Probe-Identified Antibodies Define Three 
VRC01 -Lineage Clades 

The broadly neutralizing antibodies VRC01 , VRC02, and VRC03 
were previously isolated from an August 2008 sample of donor 
45 peripheral blood mononuclear cells (PBMCs), using the resur- 
faced stabilized core 3 (RSC3) probe (Wu et al., 201 0). From addi- 
tional 2008 samples, other probes were subsequently used 
to isolate five more VRC01 -class antibodies: NIH45-46, NIH45- 
177, NIH45-243, VRC06, and VRC06b (Li et al., 2012; Scheid 
etal., 2011). All eight of these antibodies appeared to be somatic 
variants from a single VRC01 lineage (Zhou et al., 2013). 

To gain insight into the scope of the VRC01 lineage, we per- 
formed RSC3-specific B cell sorting on PBMCs from April 2002 
and August 2008 time points to identify additional lineage mem- 
bers (Figure 1 A, left and right). We also performed B cell sorting 
with a modified gp120 outer-domain probe (Joyce et al., 2013) 
on PBMCs from January 2008 (Figure 1A, middle). Five anti- 
bodies were recovered from the 2002 sample, including a 
new VRC01 -lineage antibody, VRC08, which was substantially 
different from those identified previously. The CDR H3 of 
VRC08 was 23 amino acids in length by Kabat definition, sub- 
stantially longer than those of VRC01 and VRC07 (12 and 16 
amino acids, respectively) or VRC03 and VRC06 (14 and 15 
amino acids, respectively) (Figure IB). We assessed neutraliza- 
tion by VRC08 against 195 Env pseudoviruses. VRC08 displayed 
breadth and potency similar to VRC01 (Figure 1C; Table SI). In 
addition to VRC08, several other closely related antibodies 
were identified. In all, we identified 31 new neutralizing antibodies 
from donor 45, all with naturally paired heavy and light chains and 
derived from the same origin genes (Figure 1 B; Figure SI). 

A maximum-likelihood phylogenetic tree was constructed from 
the concatenated heavy- and light-chain nucleotide sequences 
of the 39 VRC01 -lineage antibodies (Figure 1 D); they segregated 
into three major clades, termed clade 01+07, clade 03+06, and 
clade 08. Sequence differences within clades were under 25%, 
while differences exceeded 50% between clades (Figure IE), 
and CDR H3 lengths varied from 12 to 23 amino acids. 

Tracking Potential VRC01 -Lineage Sequences in 
Longitudinal Samples 

To gain a more complete understanding of the VRC01 lineage 
and its development, we used NGS to identify potential lineage 



members over 15 years. The first sample available was from 
March 1995, well after the diagnosis of HIV-1 infection in 1990. 
Ten samples were analyzed, extending to December 2009. Dur- 
ing this 15-year period, donor 45 maintained a relatively stable 
plasma viral load of around 10,000 copies/ml and a CD4'" 
T cell count >500 cells/|il without anti-retroviral therapy (Table 
S2A). cDNA libraries from each time point, corresponding to 
the transcripts of 3-5 million PBMCs, were used as templates, 
and 5' gene family-specific primers were used to amplify VH1 
genes or Vk 3 genes, with 3' primers for both immunoglobulin 
M (IgM) and immunoglobulin G (IgG) or immunoglobulin k (Igic) 
constant regions. We estimate that each longitudinal sample 
contained 75-125,000 VH1 -family-derived B cells and 100- 
160,000 Vic3-family-derived B cells. 454 pyrosequencing, with 
either a half- or full-454 chip, generally yielded between a quarter 
million to a million raw reads for each heavy- or light-chain reac- 
tion, for each time point (Figure 2). 

A heavy-chain-specific bioinformatics pipeline with multiple 
computational sieves was used to identify VRC01 -lineage tran- 
scripts and to classify them into closely related groups of se- 
quences based on CDR H3 identity (Figure S2A, bottom panels). 
We utilized a modified cross-donor analysis algorithm and 
applied several levels of data quality filtering and de-replication 
with a 97.25% identity threshold (Extended Experimental Proce- 
dures). These quality filters ameliorated known 454 errors and 
reduced the number of unique sequences. The effect of each 
data-filtering step on the number of retained sequences is 
shown in Table S2B. Similarly, a light-chain-specific pipeline 
was used to identify and classify VRC01 -lineage light-chain tran- 
scripts (Figure S2B, bottom panels). As with the heavy chains, 
we applied data-quality filters (Figure 2B, bottom panels, and 
Table S2B). 

We were able to track the evolution of each clade of the VRC01 
lineage through the positions of clusters of closely related 
sequences on identity/divergence (l-D) plots overtime (Figure 2; 
Figures S2C-S2F). The positions of these clusters at successive 
time points revealed the persistence and continued SHM of each 
clade, with a typical increase of 5%-10% in divergence over 
10 years and a maximal sequence identity around the time of 
isolation of the referent antibody. 

Validation of Functional Antibody Clades and Definition 
of theVRCOI Lineage 

For both heavy and light chains, we found large groups of se- 
quences that were divergent from all probe-isolated antibodies 
but shared VRC01 -lineage origin genes. We previously used 
functional complementation between heavy and light chains to 
confirm functionality and membership in the VRC01 class (Wu 
et al., 201 1 ; Zhu et al., 2013). Importantly, reconstitution of unre- 
lated antibody heavy chains with a VRC01 light chain, even using 
heavy chains with high predicted VRC01 structural compatibility, 
failed to neutralize HIV-1 in ten out of ten trials (Wu et al., 201 1). 
To assess the thousands of divergent B cell transcripts from 
NGS, we clustered them into groups: heavy chains were clus- 
tered based on CDR H3 identity into CDR H3 groups; light 
chains, which have short CDR L3s, were clustered based on 
overall sequence identity into light-chain variable region (VL) 
groups. We focused on the most populous groups (those 
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Figure 1. VRC01 -Antibody Lineage: 39 Probe-Identified Antibodies Define Three Distinct Clades 

(A) Isolation of antigen-specific antibodies by sorting of donor 45. B cells were probed either with RSC3 and its AI371 mutant ARSC3 or with a modified outer 
domain OD4.2 protein HG3.2 and its D368R mutant AHG3.2. Indicated in red is the percentage of total IgG^ B cells defined as probe-specific in each gate. 

(B) Heavy- and light-chain sequence analysis for six representative antibodies, two from each clade. Residues flanking the CDR H3 are shown in gray. 

(C) VRC08 neutralization dendrogram of 195 HIV-1 Env pseudoviruses with branches colored by potency. 

(D) Maximum-likelihood tree of VRC01 -lineage antibodies from donor 45, rooted on the germline V gene sequence. Antibody clades, 01+07, 03+06, and 08 are 
indicated. 

(E) Pairwise sequence difference of heavy- (left) and light- (right) chain variable domains for the six representative antibodies. Intra-clade differences are boxed. 
See also Figure SI and Table SI . 
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Figure 2. NGS-ldentified VRC01 -Lineage Transcripts in Donor 45 over 15 Years 

(A) Clade-specific identity-divergence plots from heavy-chain longitudinal samples. Large example plot is shown at left for clade 08. Sequence divergence from 
the assigned germline V gene (x axis) and sequence identity to the VRC08 heavy-chain variable domain (y axis). Positions of the six representative antibodies from 
Figures 1 D and 1 E are shown as red X’s. Identity-divergence plots for ten longitudinal time points (right). Time points and fraction of a 454 chip used for NGS are 
indicated at the top. The top row shows a heatmap for positions of all 454 sequences. The total number of sequences is indicated at the right borders. The middle 
row shows the distribution of cross-donor positive sequences as blue dots, with gray contours indicating raw sequences. The total number of cross-donor 
positives is displayed in blue. The bottom row shows the distribution of sequences in the same CDR H3 group as any probe-isolated antibody from that clade. 
Yellow dots indicate raw sequences within the CDR H3 groups, while purple dots show the subset of those sequences that survived quality-control filtering. The 
total numbers of raw and curated sequences in the CDR H3 groups are indicated in yellow and purple, respectively; blue contours indicate cross-donor positives. 

(B) Clade-specific identity-divergence plots from light-chain longitudinal samples. The same analyses are shown as for (A), except that the middle row shows the 
distribution of sequences with a 5-amino-acid CDR L3. Because sequences are clustered across all ten time points to determine the final curated sequences 
(Extended Experimental Procedures), groups of in-clade sequences (yellow dots) with high identity to the referent antibody (e.g., in the 01/2007, 07/2007, and 01/ 
2008 time points) may appear to have resulted in no surviving high-quality sequences (no purple dots). In actuality, the representative sequence in the final curated 
set has simply been chosen from a different time point (e.g., 07/2006). 

See also Figure S2 and Table S2. 



containing at least 300 raw heavy-chain sequences or 75 raw 
light-chain sequences) and chose two representative sequences 
for testing from each heavy- and light-chain group (Figure 3; 
Tables S3A and S3B) (Extended Experimental Procedures). 
Many of these selected sequences were observed at multiple 
time points (Tables S3C and S3D). For heavy chains, representa- 
tive sequences from 19 CDR H3 groups neutralized HIV-1 when 
reconstituted with VRC01 or VRC03 light chain (Figure 3A; Table 
S3A). Two of these groups, H.l and H.N, appeared to use 
non-lineage matching JH genes (Extended Experimental Proce- 
dures), though the identity of the JH gene was uncertain due to 
the high level of SHM (~35% and ~25% diverged from VH1-2, 
respectively). These ambiguous groups were analyzed sepa- 



rately from the rest of the VRC01 lineage. For light chains, 18 
VL groups neutralized HIV-1 when reconstituted with VRC01 
or VRC03 heavy chain (Figure 3B; Table S3B). One group, L.C, 
was composed of multiple unrelated lineages using various Vk 
and Jk genes and was not included in analysis of the VRC01 
lineage. 

Although they shared the same germline origin genes (VH1-2 
and JH1), the diverse heavy-chain sequences identified could 
derive from a single lineage or multiple lineages. We observed 
that the 39 probe-identified antibodies from donor 45 all con- 
tained a cysteine at position 98 (99 in some sequences due to 
a 1-aa insertion). This cysteine is not a required feature of the 
VRC01 class, as VRC01 -class antibodies from other donors do 
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Figure 3. Many Bioinformatically Identified Sequences Not Closely Related to Probe-Identified Antibodies Neutralize Diverse Viral Strains 

(A) Representative sequences from the most prevalent CDR H3 groups were synthesized, reconstituted with VRC01 and VRC03 light chains, and tested for 
neutralization. CDR H3 groups confirmed for neutralization were assessed for sequence identity (Figure 4A) to each other and merged into clades (center column). 

(legend continued on next page) 
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not contain such a residue. We therefore used this signature 
cysteine to assess membership in the VRC01 lineage. Of 1 ,041 
curated NGS sequences assigned to the VRC01 lineage, six 
did not contain the cysteine while 1 ,035 did (99.4%). By contrast, 
of the remaining VH1-2-derived NGS reads from this donor 
not assigned to the VRC01 lineage, 104,223 sequences did not 
have this signature cysteine, while 4,641 did (4.3%; p < 
0.0001) (Figure S3B). 

To examine the relationship among the neutralizing groups 
with VRC01 -lineage origin genes, we calculated the full-length 
pairwise identities of the validated representative sequences 
and probe-identified antibodies. Pairwise identity matrices, 
grouped by similarity, are shown as heatmaps for heavy and light 
chains in Figures 4A and 4B, respectively. The CDR H3 and VL 
groups clustered into higher-order units. Critically, correspon- 
dence with maximum-likelihood phylogenetic trees (Figure 4C) 
confirmed that the clusters of neutralizing groups define phylo- 
genetic clades similar to the three originally defined by the 
probe-isolated antibodies (Figure S4). 

Known heavy- and light-chain pairings from the probe-identi- 
fied antibodies were used to approximately align the heavy and 
light maximum-likelihood phylogenetic trees. Temporal preva- 
lence was calculated from the number of non-redundant 
NGS reads assigned to each of the clades (Figure 4C, middle) 
(Extended Experimental Procedures). For each clade, preva- 
lence of reads waxed and waned independently over time. For 
example, clades FI5 and L3 remained at a fairly constant level, 
while clade 01+07 was more prevalent at intermediate time 
points, and clades FI3, FI4, and 08 increased in prevalence at 
later time points. For the three probe-identified clades (01+07, 
03+06, and 08), heavy- and light-chain prevalence should corre- 
late temporally; in practice, these correlations ranged from 0.05 
to 0.91 , suggesting possible sampling issues for clade 03+06. 
Overall, the NGS of peripheral VRC01 -lineage transcripts pro- 
vided a large number of sequences (Table S2B), allowing for 
much greater definition of VRC01 -lineage diversity than was 
possible with the 39 probe-identified antibodies (Figure 1). Mul- 
tiple branches surround and embed the original three clades in 
a more diverse phylogenetic tree with three additional heavy- 
chain clades and two additional light-chain clades (Figure 4C). 
The greater number of NGS sequences permitted quantification 
of the initial and newly identified clades over time, thus illumi- 
nating the scope, diversity, and development of the lineage. 

Conservation of Antigen-Binding Mode within the 01 +07 
Clade of the VRC01 Lineage 

The extraordinary sequence diversity of the VRC01 lineage 
revealed by NGS could represent possibilities ranging from sig- 
nificant changes in mode of antigen recognition to sequence 
alteration with a conserved binding mode. To delineate between 
these two extremes, we evaluated the longitudinal maintenance 



of a previously defined set of VRC01 signature residues (Zhou 
et al., 2013) and the structural conservation of recognition by a 
clade over time. While only 60%-70% of unrelated VH1-2 se- 
quences conserved at least eight of ten positions in the signa- 
ture, nearly all sequences from four of the six heavy-chain clades 
did so (Figures 5A and 5B; Figure S5A). For clade 01+07, we 
determined co-crystal structures of an antibody from 1995 and 
compared these to the co-crystal structure of an antibody iden- 
tified from the 2008 time point (Figures 5C-5F) (Diskin et al., 

2011) . For the 1995 antibody, co-crystal structures were deter- 
mined with extended cores both from the autologous gp120 
(from donor 45) and from a heterologous gp120. By contrast, 
the antibody from the 2008 time point was determined in com- 
plex only with a heterologous gp120 extended core (Figure 5E; 
Table S4). We calculated the root-mean-square deviations 
(rmsds) for the variable domains in the various crystal structures: 
there was a 0.38 A rmsd on Ca of the variable domains between 
the two 1995 antibody structures; between the 1995 structures 
and 2008 structure, the rmsd was 0.45 and 0.50 A for the 
same gp120 and different gp120s, respectively. Epitope recog- 
nition was highly conserved between the 1995 and 2008 anti- 
bodies, which interact with a similarly sized area of gp120 
(1,185 A^ and 1,166 A^ for the 1995 antibody, and 1,247 h? for 
the 2008 antibody) with approximately 95% of the contact area 
conserved (Figure S5B). The antibody paratope was also highly 
similar but showed an increase in size for the 2008 antibody 
compared to the 1995 antibody (1,177 h? and 1,136 h? for the 
1995 antibody, and 1 ,458 h? for the 2008 antibody). This differ- 
ence in the paratope was largely due to reduced inner domain 
and bridging sheet interactions for the 1995 antibody compared 
to the 2008 antibody, and the increase in recognized surface 
could be attributed to SFIM, especially Ser74Tyr and Ser99BArg 
in the heavy chain. 

We assessed gpl 20 binding of a VRC01 variant with the germ- 
line sequence in the heavy- and light-chain variable regions, but 
mature CDR3 regions (since the original unmutated versions 
of the CDR3 junctions cannot be accurately inferred). Such 
germline-reverted variants of VRC01 -class antibodies have 
been reported to lack gpl 20 binding activity (Jardine et al., 
2013; McGuire et al., 2013; Zhou et al., 2010). Nonetheless, 
here we show binding to the early autologous gpl 20 molecule, 
d45-01dG5 (GenBank accession number JQ609687) (Wu et al., 

2012) (Figure S5C) with affinity in the micromolar range. The 
affinity of the VRC01 lineage for this gpl 20 improved to sub- 
nanomolar levels for the 1995 and 2008 antibodies. These later 
antibodies were also able to interact with many more autologous 
and heterologous gpl 20 strains. Overall, the structures indicate 
that the 20 changes in amino acid sequence due to SHM 
between 1995 and 2008 antibodies enhance antibody-antigen 
interactions. However, antibody recognition over the 13-year 
time period remains highly similar, with core interactive residues 



Two groups with non-matching J gene assignments (gray rows) were exciuded from further anaiysis. Totai sequences in each group, probe-identified repre- 
sentative (if any), assigned V and J genes, and the most neutraiizing representative and its CDR H3 sequence (ieft coiumns) are shown. Neutraiization breadth and 
potency for both VRC01 and VRC03 iight-chain pairings are provided against seiected HiV-1 viruses from ciades A, B, and C (right coiumns). 

(B) The same information as shown in (A), for the most prevaient iight-chain-variabie region groups tested for neutraiization after reconstitution with the heavy 
chains from VRC01 and VRC03. Group L.C, shown in gray, consisted of muitipie unreiated ciones and was exciuded from further anaiysis. 

See aiso Figure S3 and Tabie S3. 
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Figure 4. Heavy and Light Chain of the VRC01 Lineage Exhibit Multi-Clade Phylogenetic Organization and Concordance 

(A) Heatmap showing pairwise sequence identities for heavy chains of probe-identified antibodies and vaiidated neutraiizing heavy-chain sequences sorted by 
overaii prevaience (among finai curated sequences). Groups with high identity and simiiar prevaiences were merged into ciades, shown at ieft. 

(legend continued on next page) 
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mostly unchanged. Thus, despite extraordinary sequence varia- 
tion, the VRC01 lineage maintains a largely conserved binding 
mode. 

Rate of VRC01 -Lineage Evolution 

Since somatic variants of the VRC01 lineage derived from a 
common precursor B cell, it is possible to determine evolutionary 
rates for the maturation of the lineage as well as for each clade. To 
quantify the rate of lineage evolution, we used BEAST v1 .8 (Drum- 
mond and Rambaut, 2007), which has been used previously to es- 
timate evolutionary rates for sequences from various population 
types, including HIV (Alizon and Fraser, 2013; Vrancken et al., 
2014). It is important to note that evolutionary rate is not equiva- 
lent to the increase in germline divergence, as it describes the 
overall rate of substitutions, not all of which result in increased 
germline divergence. For example, the same site can mutate 
multiple times, including reverting to the original nucleotide. 

Over the entire time period of the study, VRC01 -lineage heavy 
and light chains had similar evolutionary rates of 2.1 and 1.6 sub- 
stitutions per 100 nucleotides per year, respectively (Figure 6A, 
green and blue panels). This rate of mutation was similar to the 
rate of 1.5 substitutions per 100 nucleotides per year that we 
determined for the Env gene in this donor (Figure 6A, red panel) 
based on previously determined viral genome sequences (Wu 
et al., 201 2), and the rate of 1 .9 substitutions per 1 00 nucleotides 
per year determined for previously reported Env sequences from 
donor CH505 (Liao et al., 201 3). These results are also consistent 
with previous estimations of Env evolution rates, which have 
ranged from 0.69 to 1.4 substitutions per 100 nucleotides per 
year (Alizon and Fraser, 2013; Chaillon et al., 2012; Vrancken 
et al., 201 4). We also calculated evolutionary rates for each clade 
independently (Figure 6A, green and blue panels). Although 
clades H4 and L3 showed rates somewhat higher than the other 
clades, evolutionary rates for all of the VRC01 clades were similar 
at approximately two substitutions per 1 00 nucleotides per year. 

To provide an overview of the VRC01 lineage and its devel- 
opment, we produced maximum-likelihood phylogenetic trees 
with curated sequences, annotated with evolutionary rates for 
the VRC01 -lineage clades (Figure 6B). Each sequence is colored 
based on its apparent “birthday”— the time point from which it 
was first identified in the NGS data. Importantly, advancing time 
(indicated by the progression of colors) and advancing maturation 
(indicated by branch positions) were consistent, in that later tran- 
scripts appeared at greater radial distances on the tree. 

Evolutionary Rate of the VRC01 Lineage Appears to Slow 

To compare the evolutionary rates of the VRC01 lineage with 
other broadly HIV-1 -neutralizing antibody lineages, we retrieved 
CAP256-VRC26 (Doha-Rose et al., 201 4) and CHI 03 (Liao et al., 
201 3) lineage NGS sequences from GenBank. Average germline 



divergence for these lineages increased more rapidly than for the 
VRC01 lineage (Figure S6A). We also found an evolutionary rate 
of 1 1 and 9.3 substitutions per 1 00 nucleotides per year, respec- 
tively, for the CAP256-VRC26 lineage heavy and light chains, 
and 13 and eight substitutions per 100 nucleotides per year, 
respectively, for the CHI 03 lineage (from donor CH505) heavy 
and light chains (Figure 6C, purple panel). Each of these early 
rates was ~5-fold higher than that observed for the VRC01 line- 
age, which was at a much later stage in its development. 

Since these rate differences in principle could arise from differ- 
ences between the donors, we undertook direct rate calculations 
on donor 45 datasets comprising the beginning (1 995-2002) and 
end (2006-2009) of the study period (Figure 6C, green and blue 
panels). For aggregate data, including all clades, the evolu- 
tionary rate was higher for the early time period than the later 
one (2.1 versus 1.6 substitutions per 100 nucleotides per year). 
Although this difference was not statistically significant, it is 
consistent with the hypothesis that antibody SHM occurs at a 
faster rate during the early phase of lineage development. Simi- 
larly, when data from each clade were considered separately, 
the calculated rate was in all cases higher for the early time 
period than the later one, further supporting the idea that evolu- 
tionary rates slowed over time. 

As a third test for the slowing of evolutionary rates, we used the 
BEAST package to infer the date of the most recent common 
ancestor sequence for each lineage. If evolutionary rate were 
stable over time, we would expect this extrapolation to give a 
reasonable estimate (Smith et al., 2009). For the VRC01 lineage, 
the most recent common ancestor was estimated to have 
occurred for heavy chain in 1971 and for light chain in 1979 
(Figure SOB). Although the exact date at which donor 45 was 
infected is unknown, the inferred dates are implausible, as the 
AIDS epidemic began in the early 1980s. Similarly, the most 
recent common ancestor of the CAP256-VRC26 lineage was 
calculated to have occurred in early 2005 (Figure S6C), although 
the UCA of that lineage is known to have appeared sometime in 
March 2006 (Doha-Rose et al., 2014). The same inconsistency is 
also observed for the CHI 03 lineage (Liao et al., 2013) (Fig- 
ure S6D). The consistent prediction of common ancestors at 
impossibly or implausibly early dates suggests that the evolu- 
tionary rate was faster earlier in lineage development. The fact 
that this was observed for the younger CAP256-VRC26 and 
CHI 03 lineages, as well, implies that this slowing begins almost 
immediately. 

Thus, three distinct lines of evidence support the idea that 
SHM persists over an extended period of time, but evolutionary 
rate slows as an antibody lineage matures. Overall, clades within 
the VRC01 lineage had similar evolutionary rates. Rates for other 
lineages are not similar and varied over at least a factor of 5 be- 
tween fast-early and slower-late lineage development. Studies 



(B) Heatmap for light chains of probe-identified antibodies and validated neutralizing light-chain sequences, produced as in (A). 

(C) Maximum-likelihood phylogenetic trees for heavy-chain (left) and light-chain (right) sequences. The clades described in (A) and (B) can be clearly seen in the 
structure of the trees, which have similar overall topology. Temporal prevalence (middle) is charted for each clade as the fraction of unique sequences with an in- 
frame junction and no stop codons (but without manual curation) at each time point, which are assigned to that clade. For clades 01 +07, 03+06, and 08, which are 
anchored by probe-identified antibodies, the correlation of heavy- and light-chain temporal prevalence is shown (boxes). The average divergence for each clade 
at each time point is calculated from curated sequences. 

See also Figure S4. 



Cell 161 , 470-485, April 23, 2015 ©2015 Elsevier Inc. 477 




Cell 



A 




imjmwm 

^iGP/f^VNTIVR 
I GWJJ<GVi 5 hB 
r ^ m I 

feu.. ^AWISGSiKTLR 
^GWINGVNTIVR 
iGWtKGVbTTlVR 

VH position 49 50 51 52 55 57 58 68 69 71 



c 




4S-VnC*1J«l44l7>4-«aT0M MM 
M-imCM HOiaT^-IStlM HNT 

loor 

mm MM 

<t-VnCai M0I>07&<30MU mm 
4»^>MOIitt««07M«M7) MM 
45-VRC01MI1<«7£-MI»12 20:7 
MV»l«1>7“*»07e-27MI1 20C7 



l6'VnC0I.H0IWITB4M042 » 
SI 

'-VIICOTIl SMS 
‘-VRCT7C H«e 
i4»JVI»C«1.Hfi1«a7.0M3511 IMS 
/4SVftm.H01«a7.OM5721 tSK 
-f4S-VRC0lM01**T,O«9a« »»« 
l-«VV1»miMl*070*M7SS IMS 
VHISTB 



1995 

2001 

2002 



2007 

2008 
2009 



I — ^ 

f-M-S 






4$.VRC«1 MM 

4ft>VIIC01Xai4«7>«433a7 2009 
M-vncoi.ui«o7£>0209ir tm 
r4S47HC0U0l«07.E.mM4 MO? 
4S>VHC0 1 X0 1 407£-aMTM 



rV» 

If* 

>-45. 



4$-vncoixoi4«7ja-ioo«7 « mi 

_j4S-VRC(l1X0l4«7JV-OSM13 2001 



4S-VRC0I.L01«07.CM>143«4 INS 
4S'VHC0U0l4«7 O-110«SS IMS 
VKMO-01 



D CDRHl CDRH2 CDR H3 

VHl-2*02 QVQLVQSGAEVKKPGASVKVSCKASGYTFTGYYMHWVRQAPGQGLEWMGWINPNSGGTNYAQKFQGRVTMTRDTSISTAYMELSRLRSDDTAVYYCAR 

45-VRCOl .HOl+07 . 0-863513_1995 

(30 mut) L Q T MRI T LNCPIN R MK-RG-AV--P M-TD--ELDM-N F GKYCTASDYYNWDFEH 

VRC07b_2002 (37 mut) — R-S GQ D-MRI — R D-INCPIN-I-L RRP VK-RG-AV--P M-TD--FLDM-N F GKYCTASDYYNWDFEH 

VRC01_2008 (40 mut) GQM E-MRI--R E-IDCTLN-I-L KRP LK-RG-AV RPL VYSD--FL--RS-TV F-T-GKNC AAAA DYNWDFEH 

NIH45-4 6_2008 (42 mut) E-R-S GQM E-MRL--R E-LNCPIN-I-L RRP LK-RG-AV R VYSD--FL--RS-T F-T-GKYCTARDYYNWDFEH 

CDR LI CDR L2 CDR L3 



VK3-20*01 EIVLTQSPGTLSLSPGERATLSCRASQSVSSSYLAWYQQKPGQAPRLLIYGASSRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYC 

45-VRCOl. L01+07.0-110653_1995 

(16 mut) T--YG- AAA R V G T — R — A-Y N GI QQYEF 

VRC07b_2002 (21 mut) 1 T--YG- AAA R V--AG-T RW-AEYN N— S G QQYEF 

VRC01_2008 (24 mut) T-II T--YG- AAA R V--SG-T--A RW-P-YN N--SG— G QQYEF 

NIH45-46_2008 (25 mut) A T-II T G- AAA R V--SG-T--A RW-A-YN-S— N--SG— G QQYEF 



E 




35 

956 A2 

gp1 20^145.01 dG5 



9 res 
229 A2 




36 

841 A2 
gp12093TH057 




45 res 
1053 



gp12093TH057 





45-VRCOl .HOI +07.0-86351 3 
45-VRCOl .L01 +07.0-1 1 0653 
1995 (46 mut) 



5 res 
235 A2 



45-VRCOl .HOI +07.0-86351 3 
45-VRCOl .L01 +07.0-1 1 0653 
1995 (46 mut) 



30 res 
942 A2 



25 res 
818 A2 



8 res 
318 A2 




29 res 
1130 A2 



NIH45-46 
2008 (67 



6 res 
328 A2 




45-VRCOl .HOI +07.0-86351 3 
45-VRCOl .L01 +07.0-1 1 0653 




NIH45-46 



Figure 5. Conservation of VRC01+07 Clade Recognition over 15 Years of Chronic Maturation 

(A) Percentage of curated sequences in each clade conserving at least eight of the ten-residue heavy-chain signature amino acids of VRC01 -class antibodies. 
Bars are not shown where data were unavailable for a particular clade and time point. Nearly all sequences in four of six heavy-chain clades conserve at least eight 
of the ten positions, compared to only 60%-70% of unrelated VH1 -2-derived sequences (gray bars). Clades H3 (dark green) and H4 (orange) mutate away from 
the signature over the course of the study. 

(B) Sequence logos of the VRC01 class signature positions for each clade. Residues colored in blue do not match the defined signature. Note, nine of the ten 
signature residues are contained in the VH1 -2 germline. Thus, although in the non-VRCOI -lineage sequences the dominant residue at each position matches the 
signature, there is variation at almost every position, and any individual read is likely to have more than two residues mutated (A). For VRC01 sequences, by contrast, 
there is strong conservation at any given position (even those that are mutated away from the signature), and relatively few positions have any variation at all. 

(C) Longitudinal phylogenetic trees of clade 01 -i-07 sequences for heavy chain (left) and light chain (right). The color of each sequence corresponds to the date at 
which it is first identified in the NGS data. 

(D) Sequence alignment of heavy-chain (top) and light-chain (bottom) sequences from temporally diverse members of the 01 -i-07 clade showing mutation from the 
germline V gene. 

(E) Crystal structures of autologous (blue, top left) or heterologous (pale yellow, top middle and top right) gp120s determined in complex with clade 01-1-07 
antibodies (bottom). The complexes have been rotated to show the interacting surfaces on each molecule. Contacts made by 45-VRCOl .HOI -i-07. 0-86351 3/45- 
VRC01.L01 -1-07. 0-1 10653 (NGS-derived sequences from 1995, bottom left and bottom middle) are colored in green, while those made by NIH45-46 (isolated 
from 2008, bottom right) are shown in red. Heavy-chain contacts are in the darker shade. These structures show a common binding mode, with a modest increase 
in buried surface area from the 1995 to the 2008 antibodies when bound to heterologous gp120. 

(F) Antibody structures from (E) shown in ribbon format, with residues mutated from the germline V gene (top) or the 1995 sequence (bottom) shown in stick 
representation. 

See also Figure S5 and Table S4. 
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Figure 6. Rates of Evolution and Extents of Divergence for Ciades of the VRC01 Lineage over 15 Years of Chronic Infection 

(A) Point estimates and 95% highest probabiiity density of evoiutionary rates for heavy- (green panei) and iight-chain (biue panei) transcripts of the VRC01 iineage, 
shown for the iineage overaii and individuai ciades, caicuiated from subsets of the curated sequence set. Dashed iine represents the rate caicuiated for HiV-1 Env 
in donor 45. Evoiutionary rates for Env for HiV genomes isoiated from both donor 45 and donor CH505 (red panei) are comparabie to the evoiutionary rate of the 
VRC01 iineage, caicuiated from deposited sequences for those datasets. 

(B) Heavy- (ieft) and light-chain (right) phylogenetic trees for curated sequences from the VRC01 lineage annotated to show point estimates of evolutionary rate for 
each clade. Trees are shown as “birthday” trees, with each sequence colored to show the time point at which it was first observed. Later observation times 
(warmer colors) and greater maturation (larger radial distances) were consistent. 

(C) Point estimates and 95% confidence intervals of evolutionary rate for VRC01 lineage transcripts calculated separately for the beginning (1 995-2002) and end 
(2006-2009) of the study period (green and blue panels). Sequences used are subsets of those described for (A). In each case, the earlier rate is faster than the 
later one, though the differences do not reach statistical significance. For comparison, evolutionary rates of early HIV-1 -neutralizing antibody lineages from 
donors CH505 and CAP256 (purple panel) are --5-fold higher, as calculated from deposited sequences of those. Additionally estimation of least common an- 
cestors for each lineage are consistently earlier than known or plausible dates (Figure S6), suggesting that evolutionary rates were faster earlier in the history of 
each lineage. 

See also Figure S6. 



of the early lineages CAP256-VRC26 and CHI 03 suggested that 
this slowing begins very early in lineage development. 

VRC01 -Lineage Recognition: Epitope Variation, 

Disulfide Patterns, and Probed Surfaces 

The continuous mutation observed over years of chronic 
infection provides a means to achieve the extraordinary levels 



of SHM observed with many of the anti-HIV-1 broadly 
neutralizing antibodies. For the VRC01 lineage, this diver- 
gence reaches over 30% for individual lineage members rela- 
tive to the germline genes, although the variance between 
individual members of the lineage is considerably greater, 
with multiple ciades evolving in a multiplexed manner 
(Figure 6B). 



Cell 161 , 470-485, April 23, 2015 ©2015 Elsevier Inc. 479 




Cell 



A VRC01 lineage B 



Structure 



Development of indels and cysteines 



Epitope Recognition Paratope 



K 









45-VRC01.H5.F-185917^"[_ _ 
~^45-VRC01.H5.C-001412 _i 



45-VRC01 .H4.H-01 5736 

45-VRC01 .H034-06.E-010699 
' 45-VRC01 H03+06.H-002599 

- 45-VRC01 .H03+06.B-040901 
- 45-VRC01 .H03+06.B-033346 

- 45-VRC01 .H034-06.A- 144779 

j 45-VRC01 H03406.0-003797 

n- 45-VRC01 .H03+06.O-288744 



VRCOSB'^ 

, 45-VRC01 .H03+06.G-01 1 493 

45-VRC01 .H03+06.D-001739'^ 



L45-VRC01 .H03+06.C-000427 
nVRC03h 
'■ VRC03i 
- 45-VRC01 H03+06.C-001 1 33 
_ jVRC03g 
fir VRC036 
U^VRC03 
l-VRC03f 
^ VRC03e 

45-VRC01 H03+06.G-0221 72 
VRC061 



— VRCOeg 
'VRC0€c 
■ VRC06e 
VRC06h 
•— VRC06 

■ jlr45-VRC0 1 H03+06. B-000737 
VRC06d 
- 45-VRC01 H03+06.0-021020 

(— 45-VRC01 H3.A-022283 

' 45-VRC01 H3.C-001 938 

45-VRC01.H08.a909912 

j 45-VRC01 .H08.H-000826 
’'-45-VRC01.H08.G-043813 ’ 

45-VRCpi.H08.D-250338 I 

VRC08'ji^ 

45-VRC01 H08.F-1 17225 ^ 

-VRCOSd 
VRCOSc-^ 

VRCOSe 

VRC07b 
-VRC07C 

45-VRC01 .H0U07.F-069847 
^ 45-VRC01 HOI +07.C-12041 7 
rVRC07e 
t VRC07d 
^ VRC071 
N1H45-46 

45-VRC01 .HOI +07.C-052614 
- 45-VRC01 HOI +07.E-21 761 8 



CDRH1 CDR H2 CDR H3 FR3 



Heavy 

chain 



45-VRC01.H5.F-185917^ 



03+06 



VRC03/06/06b^/45-VRC01.H03+06.D-001739^ 



r 



VRCOIg 



VRCOli 



VRC08^/08c^/45-VRC01 .H08.F-1 1 7225-k 



01+07 



FR H3 in-dels 
CDR H3 in-dels 
CDR H3 cysteines 
New structures 



VRC01/NIH45-46 




Figure 7. The VRC01 Lineage Evolves Divergent Recognition Loops and CDR H3 Disulfides 

(A) Heavy-chain phylogenetic tree for probe-identified antibodies and validated neutralizing sequences of the VRC01 lineage annotated for the acquisition of 
molecular features. 

(B) Epitope, recognition, and paratope for representative VRC01 clades. Structures of VRC01 -lineage antibodies in complex with gp1 20 are shown with CDR H1 
in blue, CDR H2 in yellow, CDR H3 in red, FR3 in cyan, and light chain in dark gray. These are displayed on the gp120 surface for epitope (left column), as colored 
regions of a ribbon diagram for recognition (middle column), and the CDR H3 and associated cysteines are highlighted in paratope (right column). For clade H5 
(top row), the NGS-identified heavy chain 45-VRC01 .H5.F-185917 paired with the light chain from VRC01 is shown. NGS-derived heavy chains in clades 03+06 
and 08 were paired with VRC03 and VRC08 light chains, respectively. 

See also Figure S7 and Table S5. 



Several of the newly identified VRC01 -lineage clades showed 
remarkable differences. For example, the VRC08 CDR H3 is 11 
residues longer than that of VRC01 and contains three additional 
cysteines. To understand the variation of structural recognition 
of VRC01 -lineage antibodies between different clades, we crys- 
tallized the antigen-binding fragments (Fabs) of representative 
antibodies from each clade in complex with HIV-1 gpl 20. Repre- 



sentative antibodies were chosen based on their expression 
level (>5 mg/I), neutralization potency (Figure 3), and phyloge- 
netic placement (Figure 7A). From a total of ten antibodies, six 
formed crystals with extended core gpl 20, which diffracted 
beyond 3.5 A; structures of these were solved by molecular 
replacement, and structural features of their recognition were 
analyzed. 
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New structures included two from the 03+06 heavy- 
chain clade (VRC06b and 45-VRC01 .H03+06.D-001739; for 
NGS-defined antibodies we use the nomenclature “donor-line- 
age. [H/L]clade.time-point-read_number” for heavy and light 
chains), three from the 08 heavy-chain clade (VRC08, VRC08c, 
and 45-VRC01 .H08.F-117225), and one from the H5 heavy- 
chain clade (45-VRC01 .H5.F-185917). We analyzed these 
structures along with previously determined antibody-gp120 
structures from donor 45 comprising those of VRC01 , VRC03, 
NIH45-46, and VRC06 (Diskin et al., 2011; Georgiev et al., 
2013; Wu et al., 2011; Zhou et al., 2010). When gp120s were 
superimposed, modes of gp120 recognition by the variable 
domains of the donor 45 antibodies resembled each other, 
with the consensus/shared framework region of heavy chains 
(rmsds = 0.46-2.41 A) (Table S5) showing moderately greater 
diversity than that of light-chain-variable domains (rmsds = 
0.36-1 .65 A) (Table S5). While intra-donor differences between 
common/consensus framework regions were not significantly 
different from the inter-donor differences for VRC01 -class anti- 
bodies from different donors, two antibody heavy-chain regions 
showed substantial clade diversity: the CDR H3 and the heavy- 
chain-framework region 3 (FR3) (Figure 7B). The clade 08 CDR 
H3 loop (23 amino acids) extends toward the inner domain of 
gp120 and contributes ~35% and ~70% more binding surfaces 
on gp120 than CDR H3 loops of 01+07 and 03+06 clades, 
respectively (Figure 7B; Table S5). Unlike the CDR H3 of clade 
08, the CDR FI3s of clade 03+06 and clade FI5 make substantially 
less contact with the inner domain of gp120 (~130 A^ for clade 
03+06, ~200 A^ for clade FI5, and ~450 h? for clade 08). Mean- 
while, clade 03+06 antibodies are unique among antibodies of 
the lineage for their extended FR3s, the result of a seven amino 
acid insertion. These extended FR3s provide ~1 .5-, ~2.5-, and 
~4. 5-fold more binding surface on gp120 VIA/2 stem region 
than FR3s of clades 01+07, 08, and H5, respectively (Table 
S5), and induce conformation of the VI V2 region different from 
that of other antibody-gp120 structures (Figure S7A). Because 
antibody-Env complexes determined here were in the mono- 
meric gp120 core context, we docked antibodies in the recently 
determined structure of trimeric BG505 SOSIP (Julien et al., 
2013; Lyumkis et al., 2013; Pancera et al., 2014). The extended 
FR3s were spatially proximal in the trimeric context to VI V2 (res- 
idues 203-206) and V3 (residues 314-318) of the neighboring 
protomer (Figure S7B). Small rearrangements could potentially 
allow for energetically favorable contact, with positive V3 inter- 
acting electrostatically with negatively charged FR3 insertions. 
In terms of the large CDR H3 alterations, these projected onto 
the inner domain and did not interact with neighboring proto- 
mers. However, the CDR H3 of VRC08c makes potential con- 
tacts with helix-aO of the neighboring gp120 (Figure S7C). 

Notably, the disulfide bonding patterns in the CDR H3 loops 
vary between clades (Figure 7B). 01 +07 clades have an interloop 
disulfide bond between Cys98 (CDR H3) and Cys32 (CDR HI). 
Clade 03+06 and clade H5 have a single disulfide bond between 
Cys98 and CyslOOA, Cys98 and CyslOOC, respectively. Clade 
08 has two pairs of intra disulfide bonds (Cys98-Cys100J 
and Cys99-Cys1 OOE) within the CDR H3 loop, which likely 
adds structural rigidity to this CDR H3, which is approximately 
ten residues longer than observed in most of the other clades. 



Although CDR H3 cysteine diversity has been observed before 
with bovine antibodies (Wang et al., 2013), the bovine mecha- 
nism depends on specific D genes optimized for cysteine pair- 
ing, whereas, in the present case, cysteine diversity derives 
from SHM within a single lineage. 

To investigate the functional consequences of the clade 
diversity, we analyzed neutralization fingerprints for a repre- 
sentative selection of VRC01 -lineage antibodies. Remarkably, 
a dendrogram built to represent the neutralization (functional) 
features mirrors dendrograms based on sequence phylogeny, 
and, in particular, replicated the overall clade structure 
(Figure S7D). Thus, the diverse clades do have functional differ- 
ences that reflect their divergent sequences and the clade 
organization. 

Despite originating from a single ancestor B cell, different 
clades of the VRC01 lineage have evolved substantial differ- 
ences in disulfide-bonding patterns and sequence lengths for 
some of their antigen-binding loops. The CDR H2— the central 
contact surface of the antibody lineage with gp120— does not 
vary in this manner, nor do other loops of the variable domains. 
The peripheral role of both the CDR H3 and the FR3 region 
in gp120 recognition may allow for extraordinary variation as 
different antibody variants evolve to probe the gp120 surface 
in distinctive ways. 

DISCUSSION 

It has been unclear how antibody lineages achieve the 
unusually high levels of mutation commonly found for broadly 
HIV-1 -neutralizing antibodies. This led us to investigate the 
evolutionary history of B cell lineages that produce such anti- 
bodies. SHM and B cell selection are a form of “accelerated evo- 
lution” that generates high-affinity antibodies. In most cases, this 
process takes a few weeks and involves only a handful of muta- 
tions, averaging about 5% nucleotide difference from the origi- 
nating antibody genes. However, some broadly HIV-1 -neutral- 
izing antibodies are characterized by extraordinary levels of 
SHM where >30% of nucleotides in the antibody variable region 
differ from the germline-encoded sequence (Burton et al., 2012; 
Kwong and Mascola, 2012; Mascola and Haynes, 2013). One 
such antibody, VRC01 (Wu et al., 2010), was isolated approxi- 
mately two decades after the donor was diagnosed with HIV-1 
infection. Here, we used antibody isolation, NGS, and crystal 
structures to characterize the VRC01 lineage from 1995-2009. 
We observed extraordinary diversity, a result of a high rate of 
SHM over years of chronic infection. Of note, several prior 
studies indicated germ line-reverted variants of VRC01 -class an- 
tibodies not to bind HIV-1 gp120, raising the question of how 
they arise (Jardine et al., 2013; McGuire et al., 2013; Zhou 
et al., 2010); our results indicate germ line-reverted VRC01 to 
bind an early autologous Env (d45-01 dG5) (Wu et al., 201 2), sug- 
gesting the VRC01 lineage to have been initiated by interaction 
with a specific autologous Env sequence (Figure S5C). 

The NGS-derived sequences determined here are expected 
to contain errors arising from cDNA preparation, PCR ampli- 
fication, and 454 sequencing platform. We used the 454 
pyrosequencing platform because its read lengths allowed iden- 
tification of full variable region sequences. Unfortunately, the 
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sequencing reactions were error prone. We implemented qual- 
ity-control procedures to reduce errors (Extended Experimental 
Procedures) and to parse the NGS-derived data into three levels 
of experimental certainty. First, “raw” VRC01 -lineage se- 
quences, from all ten time points, comprise 124,834 heavy- 
chain and 28,500 light-chain sequences. Based on prior pub- 
lished data, these sequences are expected to have an RMS er- 
ror in sequence of 1.38% (Zhu et al., 2012). Second, “curated” 
sequences comprise 1,041 heavy- and 492 light-chain se- 
quences, and, of these, 162 heavy-chain and 119 light-chain se- 
quences were observed to have biological replicates at the 
97.25% level in other time points. Third, “confirmed neutral- 
izers” comprise the closest to consensus or most highly repre- 
sented sequences, and these are expected to have median ac- 
curacy in sequence of 98.9% identity based on the accuracy of 
VRC01 -class plasmids (Extended Experimental Procedures). 
Additionally, 36% of the confirmed neutralizing sequences had 
temporal biological replicates at >97.25% identity. Finally, we 
note that evolutionary rate calculations were performed with 
curated sequences, and the results were similar over multiple 
bootstraps, suggesting that these calculated rates were not 
substantially affected by NGS-sequence errors. Thus, the 
“leaves” of the VRC01 -lineage phylogenetic tree and raw data 
(Figure 2) contain substantial uncertainty, while the overall clade 
structure (Figures 4 and 7) (which is defined by the confirmed 
neutralizers and probe-identified antibodies and consists of 
clades that differ on average by 40%-50% in sequence) and 
overall rate of evolution (Figure 6) should not be substantially 
affected by errors in the NGS data. 

Despite the substantial differences between clades of the 
VRC01 lineage, their average divergences were similar at each 
time point, and they also showed similar evolutionary rates (Fig- 
ure 6A). Antibody lineages unrelated to the VRC01 lineage, how- 
ever, showed substantially different evolutionary rates. Thus, 
each member of a lineage appeared to share common rate char- 
acteristics. Notably, this rate does not appear to be influenced by 
the differing neutralization of the individual clades on autologous 
virus (Wu et al., 2012), as might be expected from the co-evolu- 
tion of virus and neutralizing antibody (Richman et al., 2003; Wei 
et al., 2003). These finding suggest that the criteria for continued 
maturation may be less stringent than the criteria for effective 
neutralization and that the rate of antibody evolution may depend 
on the number of cell divisions from the single originating com- 
mon ancestor B cell. 

Our data suggest that the evolutionary rate of an antibody 
lineage slows as it matures. Importantly, even the lowest rates 
of antibody evolution that we observed were comparable to the 
evolutionary rate of HIV-1 Env. For the 01+07 clade, heavy 
chains had already mutated to 20% divergence at the earliest 
study time point, and we observed continuous SHM to >30% 
at the end study point (Figure S6A). Likewise, the levels of 
SHM for other heavy-chain and light-chain clades of the line- 
age also increased over time, despite high initial divergence. 
These data suggest that SHM may continue for as long as an- 
tigen persists in long-term chronic infection. The rapid evolu- 
tionary rates described here, combined with persistent antigen 
due to chronic infection over years, provide a mechanism to 
explain how antibodies can develop the high levels of SHM 



and lineage diversity required for broad and potent neutraliza- 
tion of HIV-1. 

Previously described antibodies from the VRC01 lineage of 
donor 45 were diverse enough that they appeared to represent 
several separate lineages (Wu et al., 2010, 2011). Here, we 
show that the diversity of the VRC01 lineage extends far beyond 
what has generally been thought possible, comprising at least 
six distinct heavy-chain clades and five light-chain clades. The 
curated deep-sequencing data fill in details of this clade struc- 
ture, without deviating from the outlines provided by functionally 
validated sequences and antibodies (Figure S7E). Each clade 
had distinctive sequence and structure characteristics. CDR 
H3 length within the lineage ranged from 9 to 23 residues, and 
CDR H3 cysteines ranged from a single cysteine, which formed 
an interloop disulfide with another CDR, to four cysteines, which 
formed two intraloop disulfides (Figure 7B). This diversity and its 
continued evolution present a picture of antibody immunity in 
which extraordinary variation within just a few antibody line- 
ages— or even a single lineage— may be of critical importance 
for opposing HIV-1 . 

EXPERIMENTAL PROCEDURES 
Human Specimens 

Recovery of PBMCs from donor 45 (Li et al., 2007; Wu et al., 2010) has been 
described previously. Donor 45 samples from different time points were 
collected with informed consent under clinical protocols approved by the 
appropriate institutional review board (IRB). 

Isolation of Donor 45 Antibodies 

Fluorescence-activated cell sorting of antigen-specific IgG'^ B cells from donor 
45 PBMC and the amplification and cloning of immunoglobulin genes were 
carried out using previously described protocols (Wu et al., 2010). 

Expression and Purification of Antibodies and Fab Fragments 

Expression plasmids for heavy and kappa chains were constructed as 
described previously (Zhou et al., 2010). The expression and purification of 
antibody IgGs and preparation of Fab fragments were carried out as described 
in Extended Experimental Procedures. 

Neutralization Assessment 

Neutralization of donor 45 antibodies were measured using single-round-of- 
infection HIV-1 Env pseudoviruses and TZM-bl target cells using protocols 
described in Extended Experimental Procedures. 

Crystallization, X-Ray Data Collection, Structure Determination, and 
Refinement of Donor 45 Antibodies in Complex with HIV-1 gp120 

Purification, crystallization of antibody-gp120 complexes, and data collection 
are described in Extended Experimental Procedures. All diffraction data 
were integrated and scaled with the HKL2000 suite (Otwinowski and Minor, 
1997). Structure solution, refinement, and analysis are described in Extended 
Experimental Procedures. 

454 Pyrosequencing 

454 pyrosequencing libraries from donor 45 were prepared and 454 pyrose- 
quencing of the PCR products were performed with modifications to those 
described previously (Wu et al., 2011) and in Extended Experimental 
Procedures. 

Bioinformatics Analysis 

Bioinformatics analyses of the longitudinal 454 data were performed using al- 
gorithms similar to those described previously (Zhu et al., 201 3), implemented 
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in a new Python code base. Data-quality filtering and other new bioinformatics 
methods are described in Extended Experimental Procedures. 

ACCESSION NUMBERS 

Coordinates and structure factors for the eight antibody-HIV-1 gp1 20 complex 
structures have been deposited in the Protein Data Bank with accession 
numbers 4XVS, 4XVT, 4S1Q, 4S1R, 4S1S, 4XNZ, 4XMP, and 4XNY. Raw 
454 data have been deposited in the NCBI Short Reads Archive with accession 
number SRP052625. In addition, 1,041 curated heavy-chain sequences 
(accession numbers KP840719-KP841751), 33 functionally validated NGS- 
derived heavy-chain sequences (accession numbers KP840592-KP840624 ), 
492 curated light-chain sequences (accession numbers KP841752- 
KP842237 ), 32 functionally validated NGS-derived light-chain sequences 
(accession numbers KP840625-KP840656 ), and 31 new probe-identified an- 
tibodies with both heavy- and light-chain sequences (accession numbers 
KP840657-KP840687 and KP840688-KP840718, respectively ) have been 
deposited in GenBank. 
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figures, and five tables and can be found with this article online at http://dx.doi. 
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SUMMARY 

Effector CDS'" T cells (CDS Te) play a key role during 
hepatotropic viral infections. Here, we used advanced 
imaging in mouse models of hepatitis B virus 
(HBV) pathogenesis to understand the mechanisms 
whereby these cells home to the liver, recognize anti- 
gens, and deploy effector functions. We show that 
circulating CDS Te arrest within liver sinusoids by 
docking onto platelets previously adhered to sinusoi- 
dal hyaluronan via CD44. After the initial arrest, CDS 
Te actively crawl along liver sinusoids and probe 
sub-sinusoidal hepatocytes for the presence of anti- 
gens by extending cytoplasmic protrusions through 
endothelial fenestrae. Hepatocellular antigen recogni- 
tion triggers effector functions in a diapedesis-inde- 
pendent manner and is inhibited by the processes 
of sinusoidal defenestration and capillarization that 
characterize liver fibrosis. These findings reveal the 
dynamic behavior whereby CDS Te control hepato- 
tropic pathogens and suggest how liver fibrosis might 
reduce CDS Te immune surveillance toward infected 
or transformed hepatocytes. 

INTRODUCTION 

The capacity of CDS'" T cells to protect against intracellular path- 
ogens is mediated by antigen (Ag)-experienced effector cells 
that migrate to infected organs, recognize pathogen-derived 
Ags, and perform effector functions. Priming of adaptive immune 
responses during infection with intracellular bacteria or viruses 
results in extensive reprogramming of T cell trafficking so that 
effector cells can deal with pathogens that are located in periph- 
eral compartments (Masopust et al., 2001). Understanding of the 
dynamic events leading to generation and expansion of effector 
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CD8^ T cells (CD8 Te) within lymphoid organs has rapidly 
improved in recent years, particularly with the use of intravital mi- 
croscopy (IVM) (Germain et al., 2012). By contrast, less is known 
on the spatiotemporal aspects that govern CD8 Te migration and 
function at peripheral infection sites. As a general rule, CD8 Te 
are thought to recognize pathogen-infected parenchymal cells 
and perform effector functions in the brain, skin, or gut following 
extravasation from post-capillary venules mediated by different 
combinations of inflammation-regulated selectins, integrins, 
and chemokines (Mueller, 2013). 

The liver is a vital organ in which pathogenesis and outcome of 
infection by clinically relevant noncytopathic viruses, such as 
hepatitis B virus (HBV) or hepatitis C virus (HCV), is determined 
by CD8 Te (Guidotti and Chisari, 2006). Several observations 
suggest that the liver may be an exception to the classic multi- 
step leukocyte migration paradigm involving rolling, adhesion, 
and extravasation from postcapillary venules. First, leukocyte 
adhesion is not restricted to the endothelium of post-capillary 
venules and occurs also in sinusoids (Lee and Kubes, 2008). 
Second, leukocyte adhesion to liver sinusoidal endothelial cells 
(LSEC) often occurs independent of any notable rolling (Lee 
and Kubes, 2008). Furthermore, in contrast to vascular beds in 
most organs— where a continuous endothelial cell layer and a 
basal membrane physically separate parenchymal cells from 
circulating leukocytes— LSEC lack tight junctions as well as a 
basal membrane and contain numerous fenestrae of up to 
200 nm in diameter (Jacobs et al., 2010). Thus, the fenestrated 
endothelial barrier of sinusoids provides the opportunity for 
direct interaction of circulating cells with underlying hepato- 
cytes. For all these reasons, the pathways directing the spatio- 
temporal regulation of CD8 Te migration and function in the liver 
may differ from those in other vascular districts. 

Here, we have used advanced imaging methodologies in 
mouse models of HBV pathogenesis to show that hepatic CD8 
Te homing is indeed independent of selectins, p2- and a4-integ- 
rins, PECAM-1 , VAP-1 , Gai-coupled chemokine receptors, or Ag 
recognition, all previously thought to be variably relevant for 
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leukocyte trafficking in other organs (von Andrian and Mackay, 
2000; Mueller, 2013). Rather, circulating CDS Te initially arrest 
within liver sinusoids by docking onto platelets in turn adherent 
to sinusoidal hyaluronan via CD44. After the initial platelet- 
dependent arrest, CDS Te actively crawl along liver sinusoids 
and extend cellular protrusions through endothelial fenestrae 
to probe underlying hepatocytes for the presence of Ag. Hepato- 
cellular Ag recognition leading to cytokine production and hepa- 
tocyte killing occurs in a diapedesis-independent manner, i.e., 
before CDS Te extravasation into the parenchyma, and it is in- 
hibited by experimental sinusoidal defenestration and capillari- 
zation, both of which are characteristic of liver fibrosis. 

RESULTS 

CDS Te Arrest within Liver Sinusoids Independently of 
Ag Recognition 

To study how CDS Te traffic within the liver and recognize Ag, we 
made use of two transgenic mouse strains whose CDS"^ T cells 
express H2'^- or H2^-restricted T cell receptors (TCRs) specific 
for the HBV nucleocapsid (Cor) or envelope (Env) proteins, 
respectively (Isogawa et al., 201 3). Naive CDS^ T cells from these 
TCR transgenic mice (referred to as Cor93 and Env2S CDS^ 
T cells, respectively) could be differentiated in vitro into bona 
fide CDS Te (data not shown) that, when transferred into HBV 
replication-competent transgenic mice (Guidotti et al., 1995), 
caused liver disease and inhibited viral replication (data not 
shown) as much as previously reported polyclonal memory 
HBV-specific CDS^ T cells (lannacone et al., 2005). Further ana- 
lyses in HBV replication-competent transgenic mice revealed 
that (1) passively transferred Cor93 and Env2S CDS Te accumu- 
late at peak levels in the liver (see quantification of flow cytometric 
data in Figure 1 A) and no longer circulate throughout the hepatic 
vasculature by 2 hr of intravenous injection (see quantification of 
epifluorescence IVM data in Figure 1 B), and (2) virtually all CDS Te 
that arrested within the 2 hr time frame (~30% of visualized CDS 
Te, Figure 1C) adhered to hepatic sinusoids and not to post- 
sinusoidal venules (Figures ID and IE; Movie SI). The transfer 
of HBV-specific CDS Te into wild-type (WT) mice, previously in- 
jected with a reporter adenovirus (Ad-HBV-GFP) rendering 
HBV-replicating hepatocytes fluorescent (SprinzI et al., 2001), re- 
vealed that CDS Te adhere to liver sinusoids regardless of the 
location of HBV Ag-producing hepatocytes (Movie SI). Consis- 
tent with these results, the overall accumulation of CDS Te in 
the liver was independent of Ag recognition, as virtually identical 
numbers of co-transferred Cor93 and Env2S CDS Te were iso- 
lated by 2 hr of injection from the same liver of transgenic and 
nontransgenic MHC-matched and MHC-mismatched recipients 
(Figure IF; Movie SI). 

Adhesion Molecules that Govern Leukocyte Trafficking 
in Other Organs Are Not Required for CDS Te 
A ccumulation in the Liver 

Next, we investigated whether the hepatic CDS Te accumulation 
observed within 2 hr of transfer involves the same adhesion mol- 
ecules known to govern leukocyte trafficking in vascular loca- 
tions of other organs (von Andrian and Mackay, 2000). Blocking 
PSGL-1, CD62L, CD62E, VLA-4, LFA-1, PECAM-1, and VAP-1 



expressed by CDS"^ T cells or LSEC (data not shown) had no 
impact on hepatic CDS Te accumulation (Figures 2A-2C) and 
neither did CDS Te expression of CD44 (Figure 2D), which has 
been implicated in hepatic neutrophil recruitment (McDonald 
et al., 200S). Pertussis toxin (PTX) treatment of CDS Te, which in- 
hibited chemokine-mediated in vitro migration (data not shown), 
also did not alter the in vivo hepatic homing capacity of these 
cells (Figure 2E). Of note, the expression of selectins, integrin 
ligands, and chemokines is relatively low in the liver of HBV repli- 
cation-competent transgenic mice prior to CDS Te transfer (Fig- 
ures SI A and SI B), consistent with the uninflamed environment 
seen in experimentally infected chimpanzees prior to HBV-spe- 
cific CDS"^ T cell arrival (Wieland and Chisari, 2005). Increasing 
the hepatic expression of selectins, integrin ligands, and chemo- 
kines via Env2S CDS Te injection (Figures SI A and SI B) did not 
affect the liver homing potential of subsequently injected Cor93 
CDS Te either PTX-treated or not (Figures 2F and 2G). To ascer- 
tain that the results shown in Figure 2 were not limited to in vitro 
differentiated CDS Te, we verified that the liver homing potential 
of in vivo generated CDS Te isolated from the spleen of lympho- 
cytic choriomeningitis virus (LCMV)-infected mice and injected 
into recipient mice was not affected by blocking the above- 
mentioned adhesion molecules, even when multiple ligand-re- 
ceptor pairs where blocked at one time (Figures S1C-S1G). 
Thus, hepatic accumulation of in vitro or in vivo differentiated 
CDS Te is independent of PSGL-1, CD62L, CD62E, VLA-4, 
LFA-1, PECAM-1, VAP-1, their CD44 expression and Gai- 
coupled receptor signaling capability, and the degree of liver 
inflammation. 

Hepatic CDS Te Accumulation Requires Platelet 
Adherence to Sinusoidal Hyaluronan via CD44 

We previously showed that platelets facilitate the hepatic accu- 
mulation of CDS Te observed in HBV replication-competent mice 
at 1-2 days post transfer (lannacone et al., 2005). However, it re- 
mained to be determined whether a platelet-specific pathway 
could influence the early adhesion of CDS Te to liver sinusoids. 
To address this question, we crossed two transgenic mouse 
strains, one HBV replication-competent and the other with 
platelets in which the human glycoprotein (GP) Iba subunit 
replaced the mouse homolog in the GPIb-IX-V complex (mGPI- 
ba"^";hGPIba^^ mice) (Ware et al., 2000). The resulting HBV repli- 
cation-competent mGPIba'^^^hGPIba^^ mice can be depleted of 
endogenous platelets by specific anti-hGPIba monoclonal anti- 
bodies and subsequently reconstituted with mouse platelets 
lacking reactivity with the depleting antibodies (lannacone 
et al., 2008). Platelet depletion per se reduced the hepatic accu- 
mulation of CDS Te by ~50% by 2 hr after transfer (Figure 3A), 
and this reduction was associated with diminished CDS Te adhe- 
sion to LSEC (Figure 3B; Movie S2) and higher number of CDS Te 
circulating throughout the liver microvasculature (Figure 3C; 
Movie S2). Notably, platelets frequently adhered to LSEC prior 
to CDS Te transfer, forming small and transient aggregates of 
up to 10-15 platelets (Movie S3). Although these aggregates 
covered a minute fraction of the LSEC surface area (<3%, data 
not shown), circulating CDS Te docked preferentially to these 
sites (Movie S3), so that up to 30% of intrasinusoidal CDS Te 
were found attached to platelets at the 2 hr time point by static 
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Figure 1. CDS Te Arrest within Liver Sinusoids Independently of Ag Recognition 

Cor93 (10^) CDS Te (red) and Env28 (10^) CDS Te (green) were intravenously injected into HBV replication-competent transgenic mice (H2‘^^^) or into the indicated 
mouse strains. 

(A) Absolute number of hepatic Cor93 (red) and Env28 (green) CDS Te recovered at the indicated time points after injection, n = S; results are representative of 
three independent experiments. 

(B) Quantification of the number of circulating Cor93 (red) and Env28 (green) CDS Te detected at the indicated time points within the field of a view (see 
Experimental Procedures). Results are representative of at least five experiments. 

(C) Quantification of the sticking fractions for Cor93 (red) and Env28 (green) CDS Te. n = 5. 

(legend continued on next page) 
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Figure 2. Adhesion Molecules that Govern 
Leukocyte Trafficking in Other Organs Are 
Not Required for CDS Te Accumulation in 
the Liver 

(A) Percentage of Cor93 CDS Te that accumulated 
within the liver 2 hr upon transfer into HBV repli- 
cation-competent transgenic mice (H2‘^^^) that 
were previously treated with anti-PSGL1, anti- 
CD62L or anti-CD62E Abs relative to control (con- 
trol = 1 00%). n = 5; results are representative of two 
independent experiments. Similar results were 
obtained with Env28 CDS Te (data not shown). 

(B) Percentage of Cor93 CDS Te that accumulated 
within the liver 2 hr upon transfer into HBV 
replication-competent transgenic mice (H2'^^'^) 
that were previously treated with anti-VLA-4, anti- 
LFA-1 or anti-PECAM-1 Abs relative to control 
(control = 1 00%). n = 8; results are representative of 
two independent experiments. Similar results were 
obtained with Env28 CDS Te (data not shown). 

(C) Percentage of Cor93 CDS Te that accumulated 
within the liver 2 hr upon transfer into HBV repli- 
cation-competent transgenic mice (H2‘^^^) that 
were previously treated with anti-VAP-1 Abs rela- 
tive to control (control = 100%). n = 5; results are 
representative of two independent experiments. 
Similar results were obtained with Env28 CDS Te 
( data not shown). 

(D) Percentage of CD44“^“ Cor93 CDS Te that 
accumulated within the liver 2 hr upon transfer 
into HBV replication-competent transgenic mice 
(H 2 bxd) relative to WT Cor93 CDS Te (WT = 1 00%). 
n = 10; results are representative of two indepen- 
dent experiments. 

(E) Percentage of PTX-treated Cor93 CDS Te that 
accumulated within the liver 2 hr upon transfer 
into HBV replication-competent transgenic mice 
(H 2 bxd) relative to control Cor93 CDS Te (control = 
100%). n = 5; results are representative of two 
independent experiments. Similar results were 
obtained with Env28 CDS Te (data not shown). 

(F) Percentage of Cor93 CDS Te that accumulated 
within the liver 2 hr upon transfer into HBV replica- 
tion-competent transgenic mice (H2‘^^^) that were 
injected 24 hr earlier with 5x10® Env28 CDS Te 
relative to control (control = 1 00%). n = 5; results are 
representative of two independent experiments. 

(G) Percentage of PTX-treated Cor93 CDS T e (relative to untreated Cor93 CDS T e controls) that accumulated within the liver 2 hr upon transfer into HBV replication- 
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competent transgenic mice (H2 ) that were treated 24 hr earlier with Env28 CDS Te. n = 

Results are expressed as mean ± SEM. 

See also Figure SI . 



5; results are representative of two independent experiments. 



confocal immunofluorescence analysis (Figures 3D and 3E). Dy- 
namic epifluorescence IVM analysis of platelet-CD8 Te interac- 
tion revealed that 30 out of 48 arrested CD8 Te (>60%) docked 
onto platelets that were already adherent to liver sinusoids 



(data not shown). Using reconstitution experiments and genetic 
approaches to identify molecules that might mediate the capac- 
ity of platelets to support early hepatic CD8 Te accumulation, 
we found that platelet-derived CD44 (but not platelet-derived 



(D) Representative images of Cor93 (red) and Env28 (green) CDS Te within the liver vasculature. Images are representative of at least ten experiments. Scale bars 
represent 50 i^m and 20 i^m (inset). 

(E) Quantification of the localization (sinusoids versus post-sinusoidal venules) of adherent Cor93 (red) and Env28 (green) CDS T e. Cells were defined as adherent 
when they arrested for >30 s. n = 5. 

(F) Absolute number of hepatic Cor93 (red) and Env28 (green) CDS T e recovered 2 hr after injection from the livers of WT or HBV replication-competent transgenic 
mice (H2‘^, H2^ or H2'^^^). n = 8; results are representative of two independent experiments. 

Results are expressed as mean ± SEM. ***p < 0.001 . 

See also Movie SI. 
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Figure 3. Hepatic CDS Te Accumulation Requires Platelets that Have Adhered to Sinusoidal Hyaluronan via CD44 

(A) Percentage of Cor93 CDS Te that accumulated within the iiver 2 hr upon transfer into HBV repiication-competent x mGP-iba'^^";hGP-iba^^ mice that were 
previousiy depieted of piateiets (aPLT) reiative to controi (controi = 100%). n = 4; resuits are representative of three independent experiments. 

(B) Quantification of the sticking fraction for Cor93 CDS Te injected into HBV repiication-competent x mGP-iba'^'^";hGP-iba^^ mice that were piateiet-depieted 
(aPLT) or ieft untreated (controi). Resuits are representative of three experiments. 

(C) Quantification of the number of Cor93 CDS Te that were stiii circuiating2 hr after transfer into HBV repiication-competent x mGP-iba"^";hGP-lba^^ mice that 
were piateiet-depieted (aPLT) or ieft untreated (controi). Resuits are representative of three experiments. 

(D) Representative confocai micrographs of the iiver of a HBV repiication-competent x mGP-iba"'^";hGP-iba^^ mouse (H2^) that was injected 2 hr eariier with 
Cor93 (red) and Env2S (green) CDS T e. Piateiets are shown in blue and sinusoids in gray. To allow visualization of intravascular event and to enhance image clarity, 

(legend continued on next page) 
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P-selectin, CD40L, or serotonin) facilitated CDS Te homing to the 
liver (Figures 3F and S2A). Of note, CD44— expressed by CDS Te 
( data not shown) and platelets (Figure S2B)— has the capacity to 
bind to hyaluronan on liver sinusoids (McDonald et al., 200S). 
Accordingly, platelet adhesion to liver sinusoids occurred less 
efficiently in CD44“^“ mice than wild-type (WT) controls (Fig- 
ure S2C; note that CD44“^“ mice have normal blood platelet 
counts and normal in vitro platelet aggregation capacity, data 
not shown); moreover, in vivo blockade of CD44-hyaluronan 
interaction or removal of sinusoidal hyaluronan decreased early 
hepatic CDS Te accumulation and the attendant liver disease 
(Figures 3G, S2D, and S2E). Similar results (i.e., reduced hepatic 
CDS Te accumulation and reduced liver disease severity) were 
obtained when CD44-hyaluronan interaction was blocked in a 
different model of acute viral hepatitis (lannacone et al., 2005) 
that relies on endogenous, rather than adoptively transferred, 
CDS Te (Figures S2F and S2G). Altogether, these results indicate 
that CDS Te adhesion within liver sinusoids and the attendant 
immunopathology require platelet-expressed CD44 interacting 
with hyaluronan. 

CDS Te Crawl along Liver Sinusoids until Hepatocellular 
Ags Are Recognized 

We next took advantage of multiphoton IVM to study events that 
follow the initial platelet-dependent sinusoidal CDS Te arrest and 
eventually lead to cognate recognition of hepatocellular Ag. To 
this end, HBV-specific CDS Te were infused into mice that had 
been previously injected with either Ad-HBV-GFP or with ad- 
eno-associated viral vectors encoding for GFP and for specific 
HBV Ags (AAV-HBcAg-GFP or AAV-HBsAg-GFP), using experi- 
mental conditions whereby fewer than 5% of hepatocytes are 
transduced (data not shown). Dynamic imaging revealed that 
CDS Te not adjacent to Ag-expressing hepatocytes crawl up- 
stream and downstream in the liver sinusoids at an average 
speed of ~10 |im/min (note that blood flows in liver sinusoids 
at 100-400 |im/s) (Sironi et al., 2014). By contrast, CDS Te 
migrating in close proximity to Ag-expressing hepatocytes 
slowed down and eventually arrested (Figures 4A and S3A- 
S3E; Movie S4). To confirm that cessation of intrasinusoidal 
CDS Te crawling is dictated by proximity to HBV Ag-expressing 
hepatocytes and by capacity to recognize cognate Ag, Cor93 
and Env2S CDS Te were co-transferred into a lineage of HBV 
transgenic mice that express HBcAg (but not HBsAg) in 100% 
of hepatocytes (Guidotti et al., 1994). While Env2S CDS Te 
crawled within liver sinusoids of HBcAg transgenic mice at an 



average speed of ~10 iim/min, the vast majority of Cor93 CDS 
Te remained immotile, or moved much more slowly and had a 
confined motility (Figures 4B-4F, S3F, and S3G; Movie S5). Alto- 
gether, these observations indicate that CDS Te exhibit an intra- 
sinusoidal crawling behavior that halts when hepatocellular Ags 
are recognized. 

Of note, HBV Ag expression in HBcAg transgenic mice, as in 
HBV replication-competent mice, is restricted to hepatocytes 
(Isogawa et al., 2013); in keeping with this, only hepatocytes iso- 
lated from these HBV transgenic mouse lineages presented Ag 
in vitro to HBV-specific CDS Te, in contrast to LSECs, Kupffer 
cells and dendritic cells isolated from their livers or liver-draining 
lymph nodes, which did not (Figure S4 and data not shown). 
These results are consistent with recent reports showing that 
HI_A class I restricted HBV epitopes are exclusively expressed 
by human hepatocytes during natural HBV infection (Ji et al., 
2012 ). 

CDS Te Recognize Hepatocellular Ags and Perform 
Effector Functions in a Diapedesis-Independent Manner 

The foregoing results suggest that CDS Te may recognize hepa- 
tocellular Ag while still in the intravascular space. To test this 
hypothesis, we set up an immunofluorescence staining method 
allowing the detection of Ag-recognizing (i.e., IFN-y producing) 
cells with respect to their localization within the liver. Two hours 
after the co-transfer of Cor93 and Env28 CDS Te in H2'^- 
restricted HBV replication-competent or HBcAg transgenic 
mice, both populations were contained within the hepatic sinu- 
soidal lumen but only MHC-matched Cor93 CDS Te expressed 
IFN-y (Figures 5A, S5A, and S5B; Movie S6; data not shown). 
IFN-y expression by intravascular CDS Te adjacent to hepato- 
cyte expressing cognate Ag was also confirmed in WT mice 
that were injected with Ad-HBV-GFP prior to Cor93 CDS Te 
transfer (Figures S5C and S5D). That CDS Te performed effector 
functions without extravasating is further indicated by the detec- 
tion of apoptotic hepatocytes preferentially juxtaposed to intra- 
vascular MHC-matched — rather than MHC-mismatched— CDS 
Te (Figure 5B; Movie S6; quantification in Figure S6). Confocal 
3D reconstructions of these experiments occasionally revealed 
the presence of small CDSTe protrusions appearing to penetrate 
the sinusoidal wall in order to gain contact with underlying hepa- 
tocytes (Figure 5B; Movie S6). Analysis of the hepatic localization 
of CDS Te in transgenic and nontransgenic MHC-matched and 
MHC-mismatched recipients at 2 and 4 hr after CDS Te transfer 
revealed that only CDS Te capable of recognizing Ag eventually 



the transparency of the sinusoidal rendering was set to 45% (left panels) and the transparency of the cell rendering to 48% (right panels). Scale bars represent 
3 lam (left panels) and 1 .5 |am (right panels). 

(E) Percentage of Cor93 (red) and Env28 (green) CD8 Te that were adherent to endogenous platelets in the liver of a HBV replication-competent x mGP- 
lba'^^";hGP-lba^^ mouse (H2‘^) that was injected 2 hr earlier with these cells. 300 of each cell type from 30 random 40x fields of view were analyzed. Results are 
representative of two independent experiments. 

(F) Percentage of Cor93 CD8 Te that accumulated within the liver 2 hr upon transfer into HBV replication-competent x mGP-lba'^^";hGP-lba^^ mice that were 
previously depleted of platelets (aPLT) and then injected with PBS, WT platelets, P-selectin“^“ platelets, CD40L“''“ platelets, or CD44“''“ platelets relative to 
control (control = 100%). n = 6; results are representative of three independent experiments. For the role of platelet-derived serotonin see Figure S2. 

(G) Percentage of Cor93 CD8 Te that accumulated within the liver 2 hr upon transfer into HBV replication-competent x mGP-lba'^^";hGP-lba^^ mice that were 
previously injected with anti-CD44 Abs (clones KM81 or IM7 that either block or not the capacity of CD44 to bind to hyaluronan [HA], respectively) or hyal- 
uronidase (HA-ase) relative to control (control = 100%). n = 7; results are representative of two independent experiments. 

Results are expressed as mean ± SEM. *p < 0.05, ***p < 0.001 . 

See also Figure S2 and Movies S2 and S3. 
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Figure 4. CDS Te Crawl along Liver Sinusoids until Hepatocellular Ags Are Recognized 

(A) Intrasinusoidal crawling of CDS Te, visualized by multiphoton IVM (still image from Movie S4) in the liver of a WT mouse that was injected with Ad-HBV-GFP 
2 days prior to Cor93 CDS Te transfer. The movie was recorded -^1 hr after Cor93 CDS Te transfer. Red lines denote tracks of individual Cor93 CDS Te. Sinusoids 
are in gray. Scale bar represents 50 |am. Similar results were obtained when Env2S CDS Te were transferred into Ad-HBV-GFP-injected mice or when Cor93 or 
Env2S CDS Te were transferred into WT mice previously injected with AAV-HBcAg-GFP or AAV-HBsAg-GFP, respectively (data not shown). 

(B) Intrasinusoidal crawling of CDS Te, visualized by multiphoton IVM (still image from Movie S5) in the liver of a HBcAg transgenic mouse (142*^) that was injected 
with H2'^-restricted Cor93 (red) and H2^-restricted Env2S (green) CDS Te. Red and green lines denote tracks of individual Cor93 and Env2S CDS Te, respectively. 
Sinusoids are in gray and hepatocellular nuclei are in blue. Scale bar represents 50 lam. 



(legend continued on next page) 
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extravasate (Figures S7A-S7C). These observations, coupled 
with the notion that the ratio between intravascular and extravas- 
cular CDS Te performing effector functions (i.e., expressing IFN- 
y or triggering hepatocellular apoptosis) decreases over time 
(Figures S7D and S7E), indicate that extravasation follows rather 
than precedes Ag recognition. Together, the above results indi- 
cate that CDS Te do not need to extravasate in order to recognize 
hepatocellular Ags and to perform effector functions. 

CDS Te Recognize Hepatocellular Ags through 
Fenestrations in Liver Sinusoidal Endothelial Cells 

The above-mentioned results and the unique anatomy of the liver 
microvasculature suggest that CDS Te cellular protrusions might 
reach underlying hepatocytes through anatomical discontinu- 
ities in the hepatic sinusoid created by fenestrae that penetrate 
the endothelial cell lining. To test this hypothesis, we established 
a correlative technique that combines the specificity of 3D 
confocal fluorescence microscopy with the resolution of trans- 
mission electron tomography; this was carried out on liver sec- 
tions from HBcAg transgenic mice whose LSEC are fluorescent 
(see Experimental Procedures). While confirming at higher reso- 
lution that CDS Te protrusions penetrate the sinusoidal wall, we 
found that these structures correspond to cytoplasmic CDS Te 
extensions that protrude through the LSEC fenestrae and con- 
tact the hepatocyte membrane over relatively large surface areas 
(Figures 5C and 5D; Movie S7). Although the fixation procedure 
of this correlative technique does not permit Ab staining (thus 
it cannot detect Ag recognition markers), these results are 
compatible with the formation of an immunological synapse be- 
tween intravascular CDS Te and hepatocytes. The notion that 
CDS Te extend cytoplasmic protrusions penetrating the sinusoi- 
dal barrier of WT mice as well (data not shown) suggests that 
contacts with sub-sinusoidal hepatocytes might be the mecha- 
nism whereby CDS Te crawling intravascularly probe the liver 
for the presence of hepatocellular Ag. 

To test the biological significance of CDS TE-mediated 
Ag recognition through sinusoidal fenestrae, HBV replication- 
competent transgenic mice were chronically exposed to low 
doses of arsenite (Straub et al., 2007), a treatment that reduces 
liver porosity (i.e., number and size of sinusoidal endothelial cell 
fenestrae) recapitulating the sinusoidal defenestration that char- 
acterizes liver fibrosis (Friedman, 2004). Arsenite treatment 
decreased liver porosity by ~5-fold (Figures 6A-6C) without 
affecting neither hepatic HBV Ag expression nor the capacity 
of hepatocytes to present Ag to CDS Te in vitro (data not shown). 
This same treatment did not alter the hepatic homing of CDS Te, 
which is platelet-dependent, but it reduced their in vivo Ag 
recognition capacity, as evidenced by the reduction of both 
IFN-y expression and sALT elevation (Figures 6D-6J). To rule 
out off-target effects of arsenite treatment, we infected arse- 



nite-treated mice with LCMV, a virus whose tropism in the liver 
is mostly restricted to intravascular Kupffer cells (Guidotti 
et al., 1999) and where, therefore, subsequently transferred 
LCMV-specific CDS Te should recognize infected cells indepen- 
dent of contacts through sinusoidal endothelial fenestrae. 
Indeed, arsenite exposure impacted neither the homing nor the 
Ag-recognition capacity of LCMV-specific CDS Te (Figures 6K 
and 6L). 

Next, we evaluated whether the deposition of extracellular 
matrix in the space of Disse— a process known as liver capillari- 
zation, which creates a physical barrier between sinusoidal fe- 
nestrae and hepatocellular membranes and is frequently found 
infibrotic livers (Friedman, 2004)— could also limit Ag recognition 
by intravascular CDS Te- To this end, we transferred HBV-spe- 
cific CDS Te to HBV replication-competent transgenic mice 
with liver fibrosis as a consequence of chronic exposure to car- 
bon tetrachloride (Figures 7A-7C). This treatment did not alter 
the capacity of CDS Te to home to the liver, while significantly 
impaired CDS Te ability to recognize Ag on hepatocytes but 
not on intravascular Kupffer cells (Figures 7D-7H). 

Altogether, these results indicate that CDS Te recognize hepa- 
tocellular Ags through sinusoidal endothelial fenestrations and 
suggest a mechanism whereby liver fibrosis reduces T cell im- 
mune-surveillance toward infected or transformed hepatocytes. 

DISCUSSION 

In this study, we coupled advanced imaging techniques and 
models of HBV pathogenesis to reveal previously unappreciated 
determinants that regulate the migration, Ag recognition, and 
effector function of CDS Te within the liver. In contrast to most 
other organs— where CDS Te arrest is mainly restricted to post- 
capillary venules and promoted by inflammation (von Andrian 
and Mackay, 2000)— CDS Te circulating through the liver initially 
arrest within sinusoids and they do so independently of selec- 
tins, Gai-coupled chemokine receptors, p2- and a4-integrins, 
PECAM-1 , and VAP-1 . Furthermore, sinusoidal arrest and early 
accumulation of CDS Te occurs independently of their capacity 
to recognize hepatocellular Ag. 

We previously used similar models of HBV pathogenesis to 
show that platelets are involved in hepatic CDS Te accumulation 
occurring 1-2 days post transfer (lannacone et al., 2005), but the 
molecular mechanisms and spatiotemporal dynamics underly- 
ing these observations have remained elusive. Herein, we 
extended these observations by showing that (1) platelets 
adhere to LSEC even under steady-state conditions, leading to 
the transient formation of small intrasinusoidal aggregates on 
LSECs; (2) this process is mediated by platelet-expressed 
CD44 interacting with LSEC hyaluronan; and (3) intrasinusoidal 
platelet aggregates function as preferential docking sites for 



(C) Still image (large left panel) and time lapse recording (small right panels) in the liver of a HBcAg transgenic mouse (H2^) that was injected with Cor93 (red) 
and Env28 (green) CDS Te. Red and green lines denote tracks of individual Cor93 and Env28 CDS Te, respectively. Sinusoids are in gray. Elapsed time in 
minutes:seconds. Scale bar represents 15 |am (left) and 10 |am (right). 

(D-F) Mean speed (D), displacement (E), and straightness (F) (see Experimental Procedures) of individual Cor93 (red) and Env28 (green) CDS Te in the liver of a 
HBcAg transgenic mouse (H2‘^). Data are representative of two independent experiments. 

Results are expressed as mean ± SEM. ***p < 0.001 . 

See also Figures S3 and S4 and Movies S4 and S5. 
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Figure 5. CDS Te Recognize Hepatocellular Ags and Perform Effector Functions in a Diapedesis-Independent Manner 

(A) Representative confocal micrographs of the iiver of a HBV repiication-competent transgenic mouse (H2‘^) that was injected 2 hr eariier with Cor93 (red) and 
Env28 (green) CDS Te. Sinusoids are shown in gray and iFN-y in yeiiow. To aiiow visuaiization of intravascuiar events and to enhance image ciarity, the trans- 
parency of the sinusoidai rendering in the right panei was set to 70% and that of T ceiis to 60%. Scaie bars represent 4 ^im. See aiso Movie S6 and Figure S5. 
Simiiar resuits were obtained in simiiariy treated HBcAg transgenic mice (data not shown). 

(B) Representative confocai micrographs of the iiver of a HBV repiication-competent transgenic mouse (H2‘^) that was injected 2 hr eariier with Cor93 CDS Te (red). 
Sinusoids are shown in gray and caspase 3 in brown. To aiiow visuaiization of intravascuiar events and to enhance image ciarity, the transparency of the si- 
nusoidai rendering in the right panei was set to 50%. Scaie bars represent 5 ^irn. See aiso Movie S6 and Figure S6. Simiiar resuits were obtained in simiiariy treated 
HBcAg transgenic mice (data not shown). 

(legend continued on next page) 
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circulating CDS Te- Other platelet molecules that had been 
previously implicated in the cross-talk between platelets and 
adaptive immunity, such as CD40L (lannacone et al., 2008) or se- 
rotonin (Lang et al., 2008) were shown to be dispensable for 
these processes. Notably, platelets have been shown to form in- 
trasinusoidal aggregates on the surface of bacterially infected 
Kupffer cells, possibly as a pathogen-trapping mechanism pro- 
moting immune-mediated clearance (Wong et al., 2013). In 
contrast to this study, we saw no evidence for a preferential for- 
mation of platelet aggregates on Kupffer cells in our system (data 
not shown), where Kupffer cells are neither infected with bacteria 
nor targeted by CD8 Te. These results, together with the notion 
that Kupffer cell depletion does not affect hepatic CD8 Te accu- 
mulation in these mouse models (Sitia et al., 201 1), indicate that 
platelet-Kupffer cell interaction does not play a role in the hepatic 
homing of CD8 Te targeting hepatocellular Ags. 

It is noteworthy that experiments using anti-GP-lba Abs to 
deplete circulating platelets by roughly 50-fold (reducing normal 
platelet counts from ~10® platelets/|il to ~2 x 10"^ platelets/|il) 
decreased the early hepatic accumulation of CD8 Te only 2- 
fold. These results suggest that either the number of circulating 
platelets vastly exceeds the number required to efficiently arrest 
CD8 Te in the liver or that a fraction of CD8 Te home to the liver 
independently of platelets. Although the failure to further reduce 
hepatic CD8 Te accumulation by treating platelet-depleted ani- 
mals with CD44 blocking Abs supports the latter hypothesis, 
future studies are needed to settle this issue definitively. 

Although our data have unambiguously identified the molecu- 
lar interaction by which platelets adhere to LSEC, the mecha- 
nism(s) supporting CD8 Te docking onto platelet aggregates 
remains unexplained. Of note, platelets possess a large array 
of surface molecules mediating adhesion to endothelial cells, 
and, among these, only P-selectin has a known ligand (i.e., 
PSGL-1) expressed also by CD8 Te (Borges et al., 1 997). As pas- 
sive PSGL-1 neutralization or reconstitution with P-selectin-defi- 
cient platelets did not alter the capacity of CD8 Te to accumulate 
intrahepatically, it appears that the interaction between P-selec- 
tin on platelets and PSGL-1 on CD8 Te is not operative in our 
system. Whether other constitutively expressed or activation- 
induced platelet molecules directly or indirectly (via the forma- 
tion of molecular bridges with CD8 Te ligands) contribute to the 
process of CD8 Te docking onto platelet aggregates remains 
to be determined. An attractive alternative hypothesis, which is 
currently under investigation, is that intrasinusoidal platelet ag- 
gregates alter local flow dynamics in ways that force CD8 Te to 
slow down and then engage platelets and/or LSEC via non-cova- 
lent interactions. 

Following the initial interaction with platelets, CD8 Te were 
shown to crawl along liver sinusoids independently of blood di- 
rection and at a speed that was 500- to 1 ,000-fold slower than 



sinusoidal flow (Sironi et al., 2014). This multi-directional intrasi- 
nusoidal crawling behavior is reminiscent of what has been 
observed for CDId-restricted NKT cells patrolling the liver 
microvasculature (Geissmann et al., 2005). While the molecular 
mechanisms mediating CD8 Te crawling are unknown, prelimi- 
nary data show that chemokine cues might not be involved 
in this process (unpublished data). Whether hepatic CD8 Te 
crawl along physical structures— as described for naive T cells 
migrating along the fibroblastic reticular cell network in the 
T cell area of lymph nodes (Bajenoff et al., 2006) or for effector 
T cells migrating along the myeloid scaffold delineating hepatic 
granulomas (Egen et al., 2008)— or whether noradrenergic 
neurotransmitters from sympathetic nerves can also modulate 
intrasinusoidal CD8 Te motility— as described for intrasinu- 
soidal hepatic NKT cells (Wong et al., 201 1)— remains to be 
determined. 

Importantly, our data show that the intrasinusoidal crawling 
behavior represents a form of immune surveillance, since it oc- 
curs independently of the presence of cognate Ag and it ceases 
following hepatocellular Ag recognition. Indeed, virus-specific 
CD8 Te were shown to recognize hepatocellular Ags and to 
perform pathogenic functions (i.e., they produced IFN-y and 
they killed HBV-expressing hepatocytes) while still in the intra- 
vascular space. These processes were mediated by the exten- 
sion of cellular protrusions through sinusoidal endothelial cell 
fenestrae by CD8 Te, producing contact sites with the hepato- 
cyte membrane that are compatible with the establishment of 
an immunological synapse (Dustin and Groves, 2012). Of note, 
this Ag probing activity by intravascular CD8 Te would require 
the formation and retraction of cellular protrusions, events that 
might seem at odds with average CD8 Te migration rates of 
~10 |im/min. The high variability among CD8 Te velocities (be- 
tween 1 and up to 30 iim/min, see Figure 4D) is compatible 
with the hypothesis that slower cells are more active in extending 
and retracting trans-endothelial protrusions than faster-moving 
ones. The notion that circulating leukocytes can interact with he- 
patocytes through endothelial fenestrations has been proposed 
in previous studies (Ando et al., 1994; Geissmann et al., 2005; 
Warren et al., 2006). Our results revealed that this process 
has functional significance, as reducing sinusoidal porosity or 
creating a physical barrier between sinusoidal fenestrae and he- 
patocellular membranes inhibited hepatocellular Ag recognition 
by CD8 Te. These results also suggest a potential mechanism 
whereby liver fibrosis (a condition promoting both sinusoidal 
defenestration and capillarization) might reduce CD8 Te immune 
surveillance toward infected or transformed hepatocytes and, in 
the latter case, favor the development and progression of hepa- 
tocellular carcinoma. 

Another novel finding from our studies is that CD8 Te extrav- 
asation from the liver microcirculation follows, rather than 



(C) Correlative confocal and transmission electron microscopy of the liver of an HBcAg transgenic mouse whose LSEC express membrane-targeted tdTomato 
(see Experimental Procedures) that was injected 30 min earlier with Cor93 CDS Te. Left: overlay of the Cor93 CDS Te and LSEC fluorescence (red and green, 
respectively) with the electron micrograph of the same section. Right: electron micrograph alone. Scale bars represent 2 ^im. 

(D) Transmission electron tomograms of five selected serial slices from the area delineated by the red inset in (C). The numbers indicate the z-distance from the 
middle section. Cor93 CDS Te and LSEC are indicated by the red and green overlay, respectively. Scale bars represent 500 nm. See Movie S7 for the complete 
tomographic reconstruction of 2S9 sections with a 1 .95 nm z-step. 

See also Figures S5, S6, and S7 and Movies S6 and S7. 
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Figure 6. Reducing Sinusoidal Porosity Limits Hepatocellular Ag Recognition by CDS Te 

(A and B) Representative scanning electron micrographs from liver sections of control (A) or arsenite-treated (B) HBV replication-competent transgenic mice 
(I_l2bxd) Yellow dotted lines denote sinusoidal edges. Scale bars represent 1 ^im. 

(C) Porosity (the percentage of liver endothelial surface area occupied by fenestrae) was measured in control and arsenite-treated mice, n = 3; results are 
representative of two independent experiments. 

(D) Percentage of Cor93 CDS Te that accumulated within the liver 2 hr upon transfer into HBV replication-competent transgenic mice (H2‘^^^) that were previously 
treated with arsenite relative to control (control = 100%). n = 20; results are representative of two independent experiments. Similar results were obtained with 
Env28 CDS Te (data not shown). 

(E) Total hepatic RNA from the same mice described in (D) was analyzed for the expression of IFN-y by qPCR. Results are expressed as fold induction (f.i.) over 
HBV replication-competent transgenic mice injected with PBS, after normalization to the housekeeping gene GAPDH. 

(legend continued on next page) 
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precedes, hepatocellular Ag recognition and effector function. 
The fate of extravasated CDS Te remains ill defined. One 
possibility is that they invade hepatocytes, enter endosomal/ 
lysosomal compartments and are degraded, a process of 
suicidal emperipolesis that was recently described for naive 
CDS"^ T cells undergoing primary activation in the liver (Bens- 
eler et al., 2011). We are attracted by the hypothesis that 
CDS Te extravasation into the liver parenchyma (whether 
through emperipolesis or other yet undefined mechanisms) 
might actually represent a way to limit Ag recognition and, 
therefore, to regulate excessive liver damage caused by CDS 
Te- This concept is supported by the notion that hepatocellular 
MHC-I expression is polarized and localized predominantly to 
the portion of the basolateral membrane facing the sinusoidal 
lumen (Warren et al., 2006), so that MHC-l-peptide complexes 
might be less accessible to CDS Te residing in extravascular 
spaces. 

In summary, the data presented here reveal novel dynamic de- 
terminants regulating trafficking and effector function of CDS Te 
after hepatocellular Ag recognition. This is particularly relevant 
for the pathogenesis of viral infections such as those caused 
by HBV and HCV, noncytopathic pathogens that replicate selec- 
tively in the hepatocyte and cause acute or chronic liver disease 
that are triggered by virus-specific CDS Te (Guidotti and Chisari, 
2006). By extension, it is conceivable that similar mechanisms 
may also be operative for bacterial and parasitic infections that 
target hepatocytes. 

EXPERIMENTAL PROCEDURES 
Mice 

Mice were obtained from various sources and maintained in SPF conditions. 
Aii experimentai animai procedures were approved by the institutionai Animai 
Committee of the San Raffaeie Scientific institute. For detaiis on mouse iines 
and BM chimera generation, see the Extended Experimentai Procedures. 

Viruses and Vectors 

For detaiis on adenovirai vectors, adeno-associated viruses and LCMV, see 
the Extended Experimentai Procedures. Aii infectious work was performed 
in designated BSL-2 or BSL-3 workspaces, in accordance with institutionai 
guideiines. 

Generation of Effector CDS*^ T Cells and Adoptive Transfer 

in vitro generation of CD8^ T ceiis (CDS Te) was performed basicaiiy as 
described (Manjunath et ai., 2001). A totai of 10^ ceiis of each ceii type were 
injected intravenousiy (i.v.) into recipient animais. in imaging experiments, 
CDS Te were iabeied with 2.5 i^M CMFDA, 2.5 |iM CFSE, 7.5 |iM CMTPX, 
10 |iM CMTMR, or 2.5 |iM BODiPY 630/650-X (Life Technoiogies) for 20 min 
at 37°C in piain RPMi prior to adoptive transfer. For detaiis, see the Extended 
Experimentai Procedures. 



Blocking Abs and Hyaluronidase Treatment 

The foiiowing biocking Abs were injected i.v. 2 hr prior to T ceii transfer: anti- 
PSGL-1 (cione 4RA10; BioXCeii; 200 ^ig/mouse), anti-CD62L (cione MEL-14; 
BioXCeii; 100 lag/mouse), anti-CD62E (cione 10E9.6; BD PharMingen; 
100 |xg/mouse), anti-VLA-4 (cione PS/2; BioXCeii; 100 ^ig/mouse), anti- 
LFA-1 (cione Ml 7/4; BioXCeii; 100 |ig/mouse and cione GAME46; BD 
PharMingen; 25 ^ig/mouse), anti-PECAM-1 (cione MEC13.3; BioLegend; 
1 00 |xg/mouse), anti-VAP-1 (80 |xg of cione 7-88 + 80 ^ig of cione 7-1 06/mouse, 
both provided by S. Jaikanen), anti-CD44 (cione KM81 , biocking CD44 binding 
to hyaiuronan [Zheng et ai., 1995]; Cedariane; 20 i^g/mouse), anti-CD44 (cione 
iM7, not interfering with CD44 binding to hyaiuronan [Zheng et ai., 1995]; 
BioXCeii; 100 ^ig/mouse). in indicated experiments mice were injected intra- 
peritoneaiiy (i.p.) with 20 U/g hyaiuronidase type-iV (Sigma-Aidrich), as 
described (Johnsson et ai., 1999). 

Env28 CDS TE-Mediated Induction of Liver Infiammation 

in order to increase the hepatic expression of seiectins, integrin iigands and 
chemokines (experiments described in Figures 2F, 2G, and SI), HBV repiica- 
tion-competent transgenic mice were injected i.v. with 5x10® Env28 CDS Te 
24 hr prior to the injection of 10^ Cor93 CDS Te. 

Treatment with Pertussis Toxin and Chemotaxis Assay 

The roie of Gai signaiing was assessed by incubating CDS Te (10^ ceiis/mi) for 
2 hr at 37°C with 100 ng/mi pertussis toxin (Merck). For detaiis on chemotaxis 
assay see the Extended Experimentai Procedures. 

Depletion and Transfusion of Platelets 

HBV repiication-competent x mGP-iba'^'^";hGP-iba^^ mice were injected i.v. 
with 80 lag of cione LJ-P3 (a monocionai Ab that recognizes the piateiet- 
specific human GP-iba) at ieast 3 hr prior to further experimentai manipuiation, 
as described (iannacone et ai., 2008). Piateiet transfusion was performed 
as described (iannacone et ai., 2008), with each mouse receiving a singie i.v. 
injection of 6 x 10® piateiets. 

Treatment with Sodium Arsenite or Carbon Tetrachioride 

in indicated experiments, mice were treated with sodium arsenite (250 ppb in 
drinking water ad iibitum) for 1 0 weeks, in other experiments mice were fed by 
orai gavage with a soiution of carbon tetrachioride (CCi 4 ) in peanut oii (Sigma- 
Aidrich) at a finai dose of 0.7 mg/g of body weight. CCi 4 was administered 
twice a week for 12 weeks, after which the treatment was suspended for a 
washout period of 4 weeks. 

Cell Isolation and Flow Cytometry 

Singie-ceii suspensions of iivers, spieens, and iymph nodes were generated 
as described (iannacone et ai., 2005; Tonti et ai., 2013). For detaiis on fiow 
cytometric anaiyses see the Extended Experimentai Procedures. 

Isolation of Primary Hepatocytes, LSEC, Kupffer Ceiis, Intrahepatic 
Dendritic Ceiis, and Dendritic Cells from Liver-Draining Lymph 
Nodes 

Primary hepatocytes, LSEC, Kupffer ceiis, and dendritic ceiis were isoiated 
essentiaiiy as described (isogawa et ai., 2013). Hepatic iymph node dendritic 
ceiis were isoiated by positive seiection using biotinyiated GDI 1c and strepta- 
vidin Magnetic Particies (BD Biosciences). For detaiis seethe Extended Exper- 
imentai Procedures. 



(F) ALT activity measured in the serum of the same mice described in (D). 

(G and H) Representative confocai micrographs from the same mice described in (D). Cor93 CD8 Te are shown in red, sinusoids in gray and iFN-y in yeiiow. 
Arrowheads denote iFN-y"^ ceiis. Scaie bars represent 20 |am. 

(i) The percentage of Cor93 CD8 Te that stained positive for iFN-y was quantified in iiver sections from the same mice described in (D). n = 90. 

(J) iFN-y mean fiuorescence intensity (MFi) of Cor93 CD8 Te was quantified in iiver sections from the same mice described in (D). n = 90. 

(K) Percentage of GP33 CD8 Te that accumuiated within the iiver 2 hr upon transfer into LCMV-infected mice that were previousiy treated with arsenite, reiative to 
controi (controi = 100%). n = 5; resuits are representative of two independent experiments. 

(L) Totai hepatic RNA from the same mice described in (K) was anaiyzed for the expression of iFN-y by qPCR. Resuits are expressed as foid induction (f.i.) over 
LCMV-infected mice injected with PBS, after normaiization to the housekeeping gene GAPDH. 

Resuits are expressed as mean ± SEM. **p < 0.01 , ***p < 0.001 . 
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Figure 7. Liver Fibrosis Limits Hepatocellular Ag Recognition by CDS Te 

(A and B) Representative Sirius Red (ieft) or transmission eiectron (right) micrographs from iiver sections of controi (A) or carbon tetrachioride (CCi 4 )-treated (B) 
HBV repiication-competent transgenic mice. Sirius Red staining is shown in red. Scaie bars represent 100 ^im (Sirius Red) and 1.5 |am (transmission eiectron 
micrographs). Red and yeiiow dotted iines denote LSEC and the hepatocyte body, respectiveiy. SD, space of Disse; H, hepatocyte. 

(C) Quantification of Sirius red staining in HBV repiication-competent transgenic mice that were treated or not with CCi 4 . n = 3; resuits are representative of two 
independent experiments. 

(D) Percentage of Cor93 CDS Te that accumuiated within the iiver 2 hr upon transfer into HBV repiication-competent transgenic mice (H2‘^^^) that were previousiy 
treated with carbon tetrachioride (CCi 4 ) reiative to controi (controi = 100%). n = 15; resuits are representative of two independent experiments. 

(legend continued on next page) 
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Imaging Studies 

For details on histochemistry, confocal immunofluorescence histology, intra- 
vital epifluorescence, and multiphoton microscopy, electron microscopy, 
and correlative light and electron tomography, see the Extended Experimental 
Procedures. 

DNA, RNA, and Biochemical Analyses 

For details on molecular and biochemical analyses, see the Extended Exper- 
imental Procedures. 

Statistical Analyses 

Results are expressed as mean ± SEM. All statistical analyses were performed 
in Prism 5 (GraphPad Software). Means between two groups were compared 
with two-tailed t test. Means among three or more groups were compared with 
one-way or two-way ANOVA with Bonferroni post-test. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and seven movies and can be found with this article online at http://dx. 
doi.org/1 0. 1 01 6/j.cell.201 5.03.005. 
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SUMMARY 

Mycobacterium tuberculosis and Staphylococcus 
aureus secrete virulence factors via type VII protein 
secretion (T7S), a system that intriguingly requires 
all of its secretion substrates for activity. To gain 
insights into T7S function, we used structural ap- 
proaches to guide studies of the putative translocase 
EccC, a unique enzyme with three ATPase domains, 
and its secretion substrate EsxB. The crystal struc- 
ture of EccC revealed that the ATPase domains are 
joined by linker/pocket interactions that modulate 
its enzymatic activity. EsxB binds via its signal 
sequence to an empty pocket on the C-terminal 
ATPase domain, which is accompanied by an in- 
crease in ATPase activity. Surprisingly, substrate 
binding does not activate EccC allosterically but, 
rather, by stimulating its multimerization. Thus, the 
EsxB substrate is also an integral T7S component, 
illuminating a mechanism that helps to explain inter- 
dependence of substrates, and suggests a model in 
which binding of substrates modulates their coordi- 
nate release from the bacterium. 

INTRODUCTION 

While all cells secrete proteins through the conserved Sec 
system, bacteria also utilize specialized secretion systems to 
interact with their environment (Waksman, 201 2). These systems 
are particularly important for bacterial pathogens, as they allow 
for regulated secretion of virulence factors into eukaryotic cells 
during infection. The type VII secretion (T7S) system, the only 
specialized secretion system found exclusively in Gram-positive 
bacteria (Huppert et al., 2014; Waksman, 2012), is required for 
virulence of several bacterial pathogens, including Mycobacte- 



rium tuberculosis (Guinn et al., 2004; Houben et al., 2014; Hsu 
et al., 2003; Stanley et al., 2003), Mycobacterium marinum (Davis 
and Ramakrishnan, 2009; Gao et al., 2004), and Staphylococcus 
aureus (Burts et al., 2005). The significance of this secretion sys- 
tem is further highlighted by the fact that loss of the ESX-1 T7S 
system in M. tuberculosis is the most important genetic differ- 
ence between virulent strains that cause tuberculosis and the 
live attenuated vaccine strain, BCG (Brodin et al., 2006; Mahai- 
ras et al., 1996; Pym et al., 2003). However, despite its medical 
importance and its broad evolutionary conservation, the molec- 
ular architecture, mechanism of secretion, and regulation of T7S 
are unknown. 

T7S systems have been identified in many Gram-positive or- 
ganisms and are defined by the presence of two conserved ele- 
ments: EccC, a membrane-bound protein with three predicted 
ATPase domains, and EsxB, a small secretion substrate contain- 
ing a WXG motif (Bitter et al., 2009; Pallen, 2002). Other compo- 
nents have been genetically linked to T7S, but these are not 
universally conserved (Abdallah et al., 2007). EccC and EsxB 
interact physically (Stanley et al., 2003), and the last seven amino 
acids of EsxB constitute a “signal sequence” that is necessary 
and sufficient for secretion through the ESX-1 system (Champion 
et al., 2006), although additional signals adjacent to these se- 
quences are also required for full secretion (Daleke et al., 2012; 
Sysoeva et al., 2014). The molecular basis of T7S substrate-tar- 
geting selection is not known, and our understanding of substrate 
recognition has been mostly limited to yeast two-hybrid and ge- 
netic studies. One interesting feature of T7S is that substrates are 
co-dependent for secretion (Fortune et al., 2005), in that genetic 
removal of one substrate abrogates secretion of all other sub- 
strates through a specific T7S system. This unique feature of 
T7S has complicated the study of individual virulence factors in 
the context of infection and has thwarted attempts to genetically 
engineer these systems to secrete heterologous proteins. 

EccC has a unique multi-domain structure consisting of a two- 
pass transmembrane domain, a short domain of unknown func- 
tion (DUF), and three P loop NTPase domains that share ~20% 
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Figure 1 . The EccC ATPase Has a Unique, 
Conserved Domain Structure and Binds to 
the EsxB Signal Sequence 

(A) Domain structure of the EccC and EssC 
ATPases. 

(B) Size exclusion chromatography showing that 
TcEsxB binds to TcEccC^cyto) and induces a large 
shift in elution volume. 

(C) Yeast two-hybrid analysis of interactions be- 
tween EccC and EsxB. Wild-type TcEsxB and 
MfEsxBi are directed specifically to their cognate 
ATPase via the last seven amino acids (boxed), 
which are not required for interaction with EsxA. 
See also Figure S1 . 
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i(dentity to one another (Figure 1A). The ATPase (domains are 
evolutionarily relate(d to the ASCE (a(d(ditional stran(d conserve(d 
glutamate) fol(d family that inclu(des protein and DNA-(directe(d 
mechanoenzymes such as FtsK, VirD4 (TrwB), and VirB4 
(TrwK) (Erzberger and Berger, 2006). These motor proteins 
generally assemble into hexameric rings with the ATPase activity 
(depen(dent on “arginine finger” resi(dues that exten(d into adja- 
cent monomers to form the active site (Ahma(dian et al., 1997). 
The in(divi(dual ATPase (domains of EccC are unique in that 
each has a long N-terminal linker that is of unknown function 
but contains several motifs that are highly conserve(d among all 
of the EccC proteins. 

We present here a series of structures of EccC, both with and 
without the EsxB signal sequence, that reveal that EccC exists in 
an autoinhibite(d state as a tightly integrate(d set of three ATPase 
(domains joine(d to one another through specific linker/pocket in- 
teractions. We show that EccC activity is activate(d by (disruption 
of one of these linker interactions and is further activate(d through 
substrate-me(diate(d multimerization of the enzyme. Ourfin(dings 
suggest that substrates, in a(d(dition to serving roles outsi(de of 



the cell, are also necessary components 
of the secretion apparatus itself, and 
provi(de a mechanistic explanation for 
the unique inter(depen(dence of substrate 
secretion in T7S. 

RESULTS 



The EsxB Signal Sequence Binds 
the EccC Translocase but Does Not 
Activate Its ATPase Activity 

To un(derstan(d the nature of the interaction 
between EccC and EsxB using an in vitro 
system, we screene(d a panel of EccC/ 
EsxB pairs from various bacterial species 
and foun(d robust expression in E. coli of 
the cytoplasmic portion of EccC from 
the thermophilic actinobacterium Ther- 
momonospora curvata (TcEccC(cyto)) an(d 
its cognate EsxB partner (TcEsxB). The 
T. curvata secretion system shares close 
homology with other actinomycete T7S 
systems and contains all of the conserve(d 
components i(dentifie(d in the M. tuberculosis Esx systems, 
inclu(ding EsxA, EsxB, EccC, EccD, EccB, and MycPI (Bitter 
et al., 2009) (Figure S1A). TcEccC(cyto), TcEsxA, and TcEsxB 
were all stable in isolation and strongly boun(d one another to 
form an EccC: EsxB: EsxA complex (Figure IB; Figures SIB and 
SIC). Similar to yeast two-hybri(d stu(dies with M. tuberculosis 
proteins (Champion et al., 2006), the last seven amino aci(ds of 
TcEsxB specifically targete(d the substrate to TcEccC, and 
swapping this sequence with the C terminus of MfEsxBi 
completely reverse(d the specificity (Figure 1C). Thus, the known 
EccC interactions of the virulence-associate(d ESX-1 system of 
M. tuberculosis are recapitulate(d in our mo(del system. 

In the ESX-1 system in M. tuberculosis, MtEccC is split into 
two polypepti(des, MfEccCa (containing the trans- and juxta- 
membrane regions and ATPase-i) and MtEccCb (containing 
ATPase 2 end ATPases), which interact with one another to 
form a complete MfEccCab complex (Stanley et al., 2003). The 
substrate MfEsxB interacts exclusively with MfEccCb and not 
with MtEccCa, and we foun(d this feature was conserve(d in 
our T. curvata system. When we artificially split TcEccC 
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Figure 2. Co-Crystal Structure Reveals Signal-Sequence Binding Pocket in TcEccCb 

(A) The crystal structure of TcEccCb (ATPase domains colored as in Figure 1A) bound to the C-terminal signal sequence of TcEsxB (gold). Binding of the 
C-terminal amino acids of TcEsxB to ATPases is mediated by interactions with two conserved hydrophobic residues that bind in a hydrophobic binding pocket. 
Only the C-terminal signal-sequence residues are interpretable in the electron density (Figure S2A), and the Y-X-X-X-D/E motif implicated in secretion (Daleke 
et al., 2012) appears disordered in the crystal. 

(B and C) The orange volume represents the simulated-annealing difference-density map calculated for ATPase 2 (B) and ATPases (C) without nucleotide and 
contoured at 4 a. 

(D) X-ray structure of the TcEsxAB heterodimer with a close-up view of the C-terminal signal-sequence helix. V98 and LI 02, which are necessary for binding to 
TcEccC, are labeled. 

(E) Binding of a fluorescently labeled signal-sequence peptide (5-FAM-VNRVOALLNG) to TcEccC(cyto) monitored in the presence of increasing concentrations of 
unlabeled competing full-length TcEsxB. Wild-type TcEsxB and TcEsxAB heterodimer compete with the peptide. Mutations in LI 02 or V98 prevent competition 
with the wild-type peptide, indicating that they do not bind. Presented data are representative experiments. 

See also Figure S2. 



into ‘TcEccCa” and “TcEccCb” fragments orthologous to 
the tuberculosis ESX-1 proteins, these fragments interacted 
robustly in the two-hybrid assay (Figure S1 B). Likewise, EsxB in- 
teracted directly with TcEccCb, but not TcEccCa, which paral- 
lels the ESX-1 system (Figures SI B and SI D). 

In analogy with other phylogenetically related translocases 
(Guglielmini et al., 2013), which are often strongly activated by 
their substrates (Massey et al., 2006), we hypothesized that 
binding of 7cEccC(cyto) to the substrate TcEsxB would activate 
its ATPase domains. However we could not measure any 
ATPase activity in the TcEccC(cyto) or TcEccCb proteins, either 
in the presence or absence of TcEsxB (Figure S1E). Likewise, 
the nucleotide binding state of EccC had no effect on the 
apparent Kp of signal-sequence binding (Figure SI F). This unex- 
pected result suggested that the binding to EccC does not 
immediately lead to work being done on the substrate. 

The Structure of TcEccCb Bound to the Signal Sequence 

In order to understand the interaction between EsxB and EccC, 
we solved the structure of TcEccCb (containing ATPase 2 and 
ATPases) using data to 3.24 A resolution, in combination with a 
peptide containing the last 23 residues of the TcEsxB substrate, 
including the C-terminal signal sequence (Figure 2A; Figure S2A 
and Table S2). Both ATPases and ATPases are clearly bound to 
ATP in the structure (Figures 2B and 2C), suggesting that the 



ATPase activity of these domains is indeed extremely low, 
even in the presence of saturating amounts of EsxB signal- 
sequence peptide. This ATPase-inactivated state appears to 
be evolutionarily conserved, as a high-resolution crystal struc- 
ture of a fragment of the related EssC ATPase from Geobacillus 
thermodenitrificans (“GbEssCb”) has a very similar structure, 
with both domains bound to ATP (Figure S2B). 

The C terminus of the signal-sequence peptide, which was pre- 
viously thought to be unstructured (Renshaw et al., 2005), forms a 
short amphipathic helix (residues 96-103) that interacts exclu- 
sively with the hydrophobic pocket on ATPases (pockets) (Fig- 
ure 2A). This C-terminal helix is likely a common feature of all 
EsxB homologs (Poulsen et al., 201 4). The helix was also present 
in a higher-resolution structure of the full-length TcEsxBA com- 
plex in the absence of ATPases (Figure 2D and Table S2). Here, 
we observed the characteristic helical hairpin seen in all EsxB ho- 
molog proteins; however, in our structure, the chain makes a turn 
through a short extended region (residues 93-95) before ending 
in a helix that matches the length of helix observed in the 
ATPase complex structure. Of note, this helix is found in a crystal 
contact with an adjacent symmetry related molecule, which 
could artificially stabilize the helical structure. Although present 
in the EsxB fragment crystallized with EccC, the Y-X-X-X-D/E 
motif implicated in secretion (Daleke et al., 2012; Sysoeva 
et al., 2014) was disordered in our crystal, suggesting that it is 
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not involved in recognition of the signal-sequence motif. Impor- 
tantly, pockets is distant from the ATP catalytic site, and binding 
of the peptide does not appear to alter the ATP binding ability of 
ATPases. Mutation of any of the EsxB interaction residues on 
either EsxB or EccC completely abrogated the interaction, 
demonstrating the specificity of its binding to pockets (Figures 
2E and S2C). Together, these data show that TcEsxB is targeted 
to TcEccCb through specific binding to the hydrophobic pockets 
on ATPases, but binding of the C terminus of TcEsxB neither re- 
quires nor enhances nucleotide hydrolysis or exchange. 

Because EccC lacked ATPase activity with or without sub- 
strate, we examined the evolutionary conservation of each 
ATPase domain among many unique EccC orthologs to deter- 
mine whether the residues required for ATPase activity are 
conserved. We found that the catalytic residues of ATPases 
and ATPases ai'© highly degenerate with respect to other related 
ATPases, especially in the catalytic glutamate of the Walker B 
motif (Figure S2D). Such changes might be expected to greatly 
reduce or eliminate ATP hydrolysis (Wendler et al., 2012), which 
is consistent with the presence of ATP in these domains 
observed in our crystal structures. In contrast, ATPasei is highly 
conserved with its closest known homolog, the motor protein 
FtsK, suggesting that ATPasei nnay serve as the active motor 
domain for EccC. Thus, ATPase 2 and ATPases appear to be 
naturally suboptimal ATPases, similar to the catalytically inactive 
domains of other multimeric ATPases such as dynein (Carter 
et al., 2011) and the Fi-ATPase (Walker, 2013). 

ATPasei Is Inhibited by Its Interaction with ATPase2 

To understand the structure of ATPasei and its relationship to 
ATPases and ATPases, we solved the crystal structure of the 
full cytoplasmic domain of TcEccC, “TcEccC(cyto),” using data 
to 2.9 A resolution (Figure 3A; Figures S3A and S3B; Table S3). 
Although the full protein is present in the crystal (Figure S3C), 
the N-terminal “DDF” domain and linkeri are disordered in the 
structure and could not be modeled. The structure is monomeric, 
as it is in solution (Figure S3D), and ATPases and ATPases are 
very similar in both their conformation and nucleotide binding 
state compared to the TcEccCb structure (RMSD 0.7 A), 
showing that binding of the signal sequence does not alter 
EccC’s structure in these domains. The interface between 
ATPasei and ATPases is remarkably similar to the interface be- 
tween ATPases and ATPases, joining together the three domains 
in a direct translation where the only interfaces between the do- 
mains are mediated by the inter-domain linkers. Highlighting the 
general importance of these linker interactions, removal of the 
N-terminal 34 amino acids homologous to linkers on MfEccCb 
completely blocked binding to MfEccCa (Figures S3E and 
S3F). Single-particle electron microscopy and 3D reconstruction 
of TcEccC(cyto) and a related ATPase from Geobacillus thermo- 
denitrificans revealed a similar monomeric structure that was 
remarkably rigid, as illustrated in the homogeneity of the class 
averages (Figures 3B and 3C; Figures S3G, S3H, S3I, and 
S3J). The DDF domain, which is required for secretion in vivo 
(Figure S3K), is also visible in these images, though its density 
is reduced, likely due to averaging of multiple flexible states. 

In ATPasei, the nucleotide binding residues and nucleotide 
loading are strikingly different from the other two domains (Fig- 



ures 3D and 3E). Despite the high ATP concentration in the crys- 
tallization solution (5 mM), ATPasei contains a sulfate ion in the 
active site (Figure 3A), whereas ATPases and ATPases are bound 
to ATP as they were in the signal-sequence-bound structure 
(Figure 2). Several structural features of the ATPasei catalytic 
site are strongly reminiscent of the ATP “empty” (Pe) subunit of 
the Fi -ATPase (Figures 3D and 3E), which is known to have a 
very low affinity for nucleotide (Menz et al., 2001; Senior, 
201 2). In particular, the Walker A lysine is rotated into an unfavor- 
able rotamer and is bound to the Walker B aspartate, displacing 
the binding of magnesium in the active site and likely preventing 
binding of ATP. An analysis of all P loop ATPases in the Protein 
Data Bank (Berman et al., 2002) that contain both ATP-bound 
and ATP-unbound subunits in the asymmetric unit found that 
this configuration of the enzymatic residues in the ATP binding 
site is very unusual under these conditions and is essentially 
restricted to structural models of the empty state of the F1- 
ATPase (Figures 3F and 3G and Table S4). 

Despite the low-affinity state in the crystal, the ability of 
ATPasei to bind ATP is required in vivo (Figures 3H and Figures 
S3L, S3M, and S3N), showing that cycling of ATPasei into an 
ATP avid conformation is required for the function of the secre- 
tion system. We conclude that we have captured a low-nucleo- 
tide-affinity state of ATPasei that must be reversed during the 
EccC catalytic cycle. 

Overlaying the three ATPase domains revealed that the linker- 
pocket interactions of ATPasei and ATPases are analogous to 
signal-sequence binding of ATPases (Figure 4A). In particular, 
the important residues for signal-sequence binding in pockets 
have clear homologs in the linkers-pocketi interaction (Figure 4B). 
Since ATPases are often modulated by the effect of N- and C-ter- 
minal appendages (Besprozvannaya et al., 2013; Karamanou 
et al., 1999; Pena et al., 2011), we hypothesized that the attach- 
ment between pocketi and linkers might allosterically regulate 
ATPasei, locking it into the low-affinity form seen in the crystal 
structure and leading to the low catalytic rate we observed 
in vitro. Much of the interface between ATPasei and linkers is 
mediated by a 1 00% conserved arginine in pocketi , R543, that in- 
teracts with W762 and L763 in linkers (Figures 5A and 5B). We 
reasoned that loss of this interaction might mimic an allosteric 
effector binding in pocketi in a manner analogous to signal- 
sequence binding to ATPases and might modulate the activity of 
ATPasei. Indeed, mutation of R543 to alanine resulted in a sharp 
increase in EccC ATPase activity (Figure 5C). Additional mutation 
of an ATPasei catalytic residue (E593Q) completely inhibited this 
activation, suggesting that this increase in ATPase activity is 
dependent on the activity of ATPasei ■ Because R543 does not 
make any direct interactions with the ATP binding or catalytic res- 
idues, these results strongly suggest that the activity of ATPasei is 
modulated allosterically by its interaction with the linker. Mutation 
of the equivalent residue at the ATPases-ATPases interface had 
no significant effect on the activity of the enzyme. Therefore, we 
conclude that the activity of ATPasei is controlled through its 
pocket-linker interaction with ATPases- Importantly, mutation of 
R543 to alanine in MfEccCai severely reduced secretion of 
EsxB by M. tuberculosis (Figures S4A and S4B), indicating that 
the interface between ATPasei and ATPases is critical for the 
secretion process. This is consistent with a model in which this 
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Figure 3. ATPasei Is Autoinhibited and Integrated into a Ridged Array of ATPase Domains 

(A) The crystal structure of TcEccC(cyto) highlighting the differences between the ATP-bound catalytic sites of ATPase 2 and ATPases and the nucleotide-free site of 
ATPasei ■ The orange volume represents a simulated-annealing difference-density map calculated without nucleotide or sulfate and contoured at 3 a. Note that 
the ATPases insert has been rotated slightly to allow for comparison between the ATPase active sites. 

(B and C) Representative EM class average of (B) TcEccC(cyto) and (C) GbEssC(cyto) showing the linear structure of EccC. Scale bars, 100 A. 

(D) ATPasei (purple), with the “empty” subunit of Fi -ATPase (PDB 1 H8H) overlaid in cyan and (E) ATPases (yellow) and ATPases (green) overlaid with the AMP- 
PNP-bound subunit of Fi -ATPase (cyan) from 1H8H. 

(F) A graph representing the distance between the Walker A lysine amino group and the closest Walker B carboxylate oxygen, as a function of the rotameric 
position of the Walker A lysine. Each triangle represents one of 31 1 PDB chains of an ATP bound, P loop ATPase identified by our protocol (see Extended 
Experimental Procedures). The orientation of the Walker A lysine was confirmed in simulated annealing difference density maps with the lysine residue removed. 

(G) A similar graph to (F) except the triangle represents the residues of “empty” ATPases from PDB entries that contain both a bound and unbound P loop ATPase 
domain in the same file. 

(H) Western blot detection of MfEsxBi and GroEL from cell supernatants (S) and cell pellet lysate (P) fractions of MfEccCi knockout and complemented cells. 

(I) Three-dimensional reconstruction at an estimated resolution of 23 A, based on 1 ,634 images in the presence of 1 mM ATP-yS and 10 mM MgCl 2 . The model 
has been contoured to fit the crystal structure. Though it is impossible to resolve the difference between the first and fourth domains, the electron density of one is 
much lower than the other three, suggesting that this domain is the DUF domain, which is disordered in the crystal structure. 

See also Figure S3. 
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Figure 4. Residues in Linker 2 and Linkers Mimic the Substrate and Bind to Pocket-i and Pockets on TcEccC 

(A) The individual ATPase domains are shown and have been rotated to reveal the path of the linker across the ATPase domain. The linker is colored and weighted 
in diameter according to the degree of conservation across 142 unique EccC sequences. 

(B) The surface has been rotated to highlight the linker groove. ATPases and the pocket residues (Figure S2C) overlay ATPasei and ATPases to highlight the 
homologies in the linker binding and signal-sequence binding pockets. The ATPases pocket is significantly shallower than ATPasei and ATPases. 
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interface couples substrate recognition in EccCb (ATPase 2 and 
ATPases) with EccCa (ATPasei) activity. 

EccC ATPase Activation by Substrate Binding 

In contrast to the wild-type enzyme, addition of TcEsxB to the 
TcEccC(cyto.R 543 A)-activated mutant led to a 5-fold saturating in- 
crease in the ATPase activity (Figure 6A), revealing that sub- 
strates can contribute to EccC activation if the autoinhibitory 
interaction between ATPasei and ATPases is removed. Mutation 
of TcEsxB residues responsible for the interaction with ATPases 
abrogated stimulation, demonstrating that the effect is specific 
to signal-sequence binding (Figure S4C). However, mutation of 
the Y-X-X-X-D/E motif in EsxB did not change stimulation by 
EsxB (Figure S4D), suggesting that these residues likely play a 
role at a different stage in the secretion cycle. The additional 
stimulation in response to TcEsxB required ATPasei activity, 
while mutation of the catalytic residues in ATPases and 
ATPases significantly reduced but did not eliminate overall 
ATPase activity (Figure S4E). In accord with this finding, ATP 
binding by ATPases and ATPases is also required for secretion 
in vivo (Figure 3H). In contrast, ATPases and ATPases alone 
had no activity and were not stimulated by TcEsxB (Figure 6A). 
Thus, although binding of ATP by ATPases and ATPases is 
required for full activity of EccC, these domains act to regulate 
the activity of ATPasei rather than additively contribute to overall 
ATPase activity. This is consistent with recent genetic evidence 
that the different ATPase domains play distinct roles during 
secretion in vivo (Ramsdell et al., 2014). 



EccC Activity Is Controlled by Multimerization 

Binding of the EsxB signal sequence to ATPases appears to 
be a simple molecular recognition event (Figure 2), and our re- 
sults suggest that binding is unlikely to change the conforma- 
tion of ATPasei. We thus reasoned that substrate binding 
could activate EccC via regulating multimerization. Indeed, 
the related FtsK and TrwB ATPases form multimers during 
their catalytic cycle in which arginine residues (“R fingers”) 
complete the active site of neighboring subunits (Gomis- 
Ruth et al., 2001; Massey et al., 2006; Wendler et al., 2012). 
ATPasei has a completely conserved R finger (Figure S4G 
and S4H) that is required for secretion in vivo (Figure S4I), 
implying that formation of the active site of ATPasei also 
involves multimerization. Furthermore, expression of ATPase- 
deficient versions of EccC in wild-type bacteria has a domi- 
nant-negative effect on secretion, consistent with this notion 
(Ramsdell et al., 2014). 

In order to investigate the role of multimerization in the 
activation of EccC, we measured the dependence of kcat on 
increasing concentrations of enzyme. In the absence of multi- 
merization, the kcat should be a constant property of the 
enzyme, but if the catalytic pocket of one ATPase molecule is 
assembled in trans with an arginine donated by a different 
ATPase molecule, the kcat of the enzyme should increase, 
as more arginine fingers become available with increasing 
concentration of enzyme. In the absence of TcEsxB, neither 
TcEccC(cyto) nor Tc EccC(cyto,R 54 SA) exhibited concentration- 
dependent ATPase activation (Figure 6B), suggesting that their 
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Figure 5. ATPasei Is Held in an Autoinhi- 
bited State by Inter-ATPase Interactions 

(A) Crystal structure of 7cEccC(cyto) with inset 
highlighting the interface between ATPasei and 
ATPase2- 

(B) Logo diagram representing the alignment of 
142 unique EccC sequences. 

(C) Disruption of ATPasei -ATPase 2 interface by 
R543A mutation activates the ATPase activity of 
TcEccC, which requires the Walker B catalytic 
residue in ATPasei (E593Q). An analogous muta- 
tion between ATPase 2 and ATPases, R892A, led 
to a small increase in activity. ATPase activity 
from three independent enzyme preparations was 
measured in triplicate, and the mean of the means 
was plotted on the graph. Error bars represent 
SD of the means. The enzyme concentration 
was 1 laM ATPase, with saturating ATP*MgCI2 
(10 mM). 

See also Figure S4. 
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activity was not dependent on multimeriziation. In contrast, in 
the presence of a 10-fold excess of TcEsxB (TcEsxB-i- 
rcEccC(cyto,R 543 A)), the ATPase activity was strongly concentra- 
tion dependent. To guarantee a one-to-one molar ratio between 
TcEsxB and TcEccC, we fused TcEsxB via a flexible 14 amino 
acid linker to the C terminus of Tc EccC(cyto.R 543 A)- This protein 
was dimeric, as determined by analytical ultracentrifugation 
(Figure S4F), and similarly to the rcEsxB-hTcEccC(cyto.R 543 A) 
complex, the kcat of this chimera was highly concentration 
dependent (Figure 6B), with a maximal activity similar to 
the saturated Tc EsxB:TcEccC(cyto,R 543 A) complex (>1 00- 
fold over wild-type). Mutation of the R-finger residue in 
the activated, substrate-ATPase fusion protein (TcEsxB- 
Tc EccC(cyto,R 543 A,R 6 i 6 Q)) reduced its activity down to the base- 
line activity level of the R543A mutant (Figure S4J). Thus, the 
R543A mutant does not exhibit concentration-dependent acti- 
vation in the absence of the substrate and also does not require 
the R-finger residue for its baseline activity. These data strongly 
support the idea that the active form of EccC is multimeric, but 
this state is sparsely populated in the absence of the EsxB 
substrate. However, the activity of this multimeric form is only 
manifest in the setting of the permissive R543A mutation. These 
experiments define a hierarchy of activation where both the ef- 
fect of the R543A mutation and the multimerization are required 
for appreciable ATPase activity. 



EsxB, but Not EsxA or EsxBA, 
Directly Multimerizes EccC 
Translocase 

1 ^ 1 ^ DCD probe the multimeric state of 

ri|| I Cl TcEccC(cyto) during active catalysis, we 

yy DDR used glutaraldehyde crosslinking to cap- 

r I E AS ture higher-order multimers. Crosslinking 

1^1 Tc EccC(cyto,R 543 A) iR th© presence of 

native TcEsxB or with the TcEsxB- 
Tc EccC(cyto,R 543 A) fusion revealed both 
dimer and higher-order oligomeric states 
that are strongly correlated with activity 
(Figure 6A). These oligomers were specific to EccC and did not 
form with the substrate alone (Figure S5A) or when ATPase was 
incubated with a signal-sequence mutant, TcEsxB(V 98 A) (Figures 
S5C and S5D). Importantly, while TcEsxB binding was insufficient 
to activate ATPase activity of the wild-type enzyme (Figure 6A), it 
effectively drove TcEccC(cyto) into higher-order complexes (Fig- 
ures S5B and S5D), indicating that substrate-mediated multime- 
rization is not sufficient to override autoinhibition mediated by 
ATPase 2 - We found that EsxB exists as a homodimer in isolation 
(Figure S5E), and thus the substrate likely stabilizes multimers by 
first forming EccC:EsxB:EsxB:EccC complexes. Although the 
addition of EsxB leads to a clear increase in multimerization of 
the ATPase, it appears to stabilize a state that occurs in the 
absence of EsxB, as both crosslinking (Figure S5B) and analytical 
ultracentrifugation (Figure S4F) experiments revealed a low level 
of EccC multimerization without EsxB. 

Surprisingly, addition of TcEsxA, another substrate that forms 
a tight 1:1 complex with TcEsxB (Figure 2D; Figure S5F; Re- 
nshaw et al., 2002) but does not directly bind to EccC (Figures 
SI B and S5G), strongly inhibited ATPase activity in a coopera- 
tive manner (Hill coefficient > 2, Figure 6 C), suggesting that 
each TcEsxA molecule affects the activity of more than two 
TcEccC(cyto,R 543 A)-TcEsxB molecules. The higher-order multi- 
mer concentration decreased with increasing TcEsxA in a 
pattern that directly mirrored the cooperative decrease in 
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Figure 6. EsxB and EsxA Substrates Control 
EccC Activity via Regulating Enzyme Multi- 
merization 

(A) The ATPase activity of the indicated TcEccC 
proteins was measured at different concentrations 
of TcEsxB. Multimerization of TcEccC(cyto,R 543 A). 
detected by glutaraldehyde crosslinking (bottom) 
increases with addition of TcEsxB (0-10 ^iM). 
Quantification of the multimer band is also indi- 
cated on the ATPase activity graph (green dotted 
line with squares) to demonstrate correlation be- 
tween multimer concentration and activity. 

(B) ATPase activity of the indicated proteins, either 
TcECCC(cyto,R543A) +/— TcEsxB Or TcECCC(cyto,R543A)“ 
TcEsxB chimeras, was measured as a function of 
enzyme concentration. In (B) and (C), each point 
represents the mean of three independent mea- 
surements. 

(C) ATPase activity of the TcEccC(cyto.R 543 A)- 
TcEsxB chimera was measured at different con- 
centrations of TcEsxA (top), and multimerization of 
the enzyme in these reactions was assessed by 
glutaraldehyde crosslinking followed by SDS- 
PAGE (bottom, top concentration of TcEsxA is 
22 |4M and concentrations are reduced 2-fold in 
each lane to the left). 

See also Figure S5. 



while EsxB homodimers promote assem- 
bly and activation of EccC, EsxA binding 
to EsxB-bound EccC leads to coopera- 
tive disassembly and inhibition of the mul- 
timeric ATPase. 

To test whether EsxA-induced inhibi- 
tion was due to disruption of the Esx- 
B:EsxB dimerization event, we measured 
inhibition of ATPase activity in the 
Tc EsxB- TcEccC(cyto,R543A) chimera with 
increasing concentrations of the signal- 
sequence mutant TcEsxB(V 98 A), which 
can still form homodimers but cannot 
bind to EccC. We found that TcEsxB(V 98 A) 
also inhibits activity (Figure S5K), support- 
ing the notion that EsxA inactivates EccC 
by removing the stabilizing effect of the 
EsxB:EsxB interaction, presumably by 
forming EccC: EsxB: EsxA trimers instead 
of EccC:EsxB:EsxB:EccC tetramers. 

DISCUSSION 



In this work we have developed a thermo- 
philic model system that allowed for the 
detailed dissection of the only two com- 
ponents of T7S conserved in all Gram- 
ATPase activity (Figure 6C), and this was not affected by muta- positive bacteria: EccC and EsxB. Based on our findings, we 

tion of the arginine finger residue (Figures S5H and SSI). Further- posit a model in which secretory substrates play an active regu- 

more, addition of TcEsxA to the TcEccC^cytoyTcEsxB chimera latory role in T7S by modulating the activity of EccC (Figure 7). In 

led to loss of dimerization of this construct (Figure S5J). Thus, the absence of EsxB, EccC is monomeric and tightly inactivated 



TcEsxA 
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Figure 7. Model of Substrate-Mediated Activation of EccC 

In the absence of substrates, EccC is monomeric. Interaction with EsxB leads 
to dimerization of the ATPase and then higher-order multimerization but 
cannot activate the enzyme. In this study, we used the R543A mutation to 
disrupt the interaction between ATPasei and ATPase 2 , although in vivo this 
role may be played by other proteins that bind to ATPasei analogously to the 
binding of EsxB to ATPases or other signals. Once ATPasei is displaced, EccC 
is activated further by multimerization mediated by a conserved R finger. EsxA 
can disrupt the EsxB:EsxB interaction and disassemble the multimer. We have 
no evidence for the structure of the EccC:EsxB dimer or stiochiomtry and 
structure of the multimeric form. Thus, both of these aspects of the model are 
speculative, though based on prior structures of related substrate proteins 
(i.e., 3GVM) and the FtsK-like ATPases (i.e., 2IUU). We have indicated this 
ambiguity with the variable “n” for the number of subunits in the multimer. 

via interactions between ATPase-i and ATPase 2 - EsxB binding to 
ATPases, which is relatively weak (~10 |iM), drives EccC multi- 
merization but is not sufficient for activation. Allosteric inter- 
actions through displacement of linkers from pocketi, which 
relieves the inhibitory interaction with ATPases, Sire also required 
to permit activation of EccC. While we do not yet know the nature 



of these activation signals, given the linker-pocket architecture 
found in each ATPase domain, we suspect that other substrates 
and/or T7S components bind to these pockets to create an 
“AND” logic gate by which secretion of multiple substrates is 
coordinated, explaining the phenomenon of mutually dependent 
type VII secretion (Fortune et al., 2005). The DUF and the trans- 
membrane domains also play an important role in secretion 
(Figure S3K), and full delineation of their contribution to the pro- 
cess awaits further experimentation. 

This model suggests that T7S activity may be governed by a 
simple, “just-in-time” post-translational control mechanism in 
which energy is expended only when key substrates are recog- 
nized by EccC (Bozdech et al., 2003). T7S may be poised for 
secretion under all conditions, with EccC waiting for delivery of 
complete sets of substrates, which would explain why removal 
of one substrate would inhibit secretion of others. In this way, 
T7S may not be directly regulated by environmental stimuli but, 
rather, actuated by signal transduction pathways that regulate 
synthesis of substrates, such as PhoP/R (Ryndak et al., 2008) 
and EspR (Raghavan et al., 2008). Although other control mech- 
anisms may be in play, this mode of regulation would not only 
conserve ATP consumption until it is needed, but would also 
allow for coordinate secretion of multiple substrates, a function 
that may be beneficial for the organism. 

Our results also suggest that EsxB homodimers, in addition to 
EsxAB heterodimers (Renshaw et al., 2002), play an important 
role in type VII secretion, a notion supported by the observation 
that ancestral T7S systems, such as those found in the phylum 
firmicutes, lack EsxA homologs and EsxB exists solely as a 
homodimer (Poulsen et al., 2014). Experimental evidence from 
the literature also supports the idea that EsxB dimers have an 
important physiological role. For example, recent work shows 
that EsxB in Bacillus subtilis is secreted as a dimer (Sysoeva 
et al., 2014). Likewise, at least four unique crystal structures 
of different EsxB homodimers from various bacterial species 
have been deposited in the Protein Data Bank, including 
2VRZ (Sundaramoorthy et al., 2008), 3GVM (Poulsen et al., 
2014), 3ZBH, and 3090. Taken together, we believe that these 
results provide compelling evidence for the role of EsxB homo- 
dimers in vivo and support a model that one role of EsxA is 
to antagonize the stimulatory effects of EsxB on EccC. How 
substrates are actually translocated out of the cell upon binding 
EccC, the oligomeric state of substrates during translocation, 
and how EccC and/or other T7S proteins modify substrates 
before export remain important questions that require further 
study. 

Most motor proteins only display maximal ATPase activity in 
the presence of a mechanical load. Indeed, the activity of 
EccC, even when activated by the substrate and the R543A mu- 
tation, is relatively low. We believe that the activity we measure 
likely represents a basal ATPase rate without the “load” of sub- 
strates to be translocated across a membrane. A graded activa- 
tion of ATPase function is reminiscent of the activation of SecA 
translocase. In this case, SecA is nearly inactive when cytosolic 
(Lill et al., 1990) but is partially activated by its interaction with 
SecYEG, which releases an interdomain, allosteric inhibition in 
SecA leading to an increase in ATPase activity (Karamanou 
et al., 2007). The motor is thus primed for the translocation 
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reaction, which is stimulated by its interaction with the signal- 
sequence-bearing protein (Chatzi et al., 2014). 

Our structural analysis shows other intriguing similarities to the 
Sec translocation system. The Sec translocase binds to a similar, 
small helical peptide using mixed electrostatic and hydrophobic 
interactions. In both cases, the binding occurs in a specialized 
groove that is distant from the ATPase active site (Gelis et al., 
2007), suggesting a role in targeting and orientation of sub- 
strates. There are also similarities to targeting of substrates in 
other secretion systems. For example, in the type III secretion 
system, a targeting sequence on a chaperone protein, CesAB, 
is required for interaction with the type III ATPase; however, 
the actual translocation is mediated by an entirely different signal 
(Chen et al., 2013). Given that several other regions of the EsxB 
and EsxA proteins have been implicated in translocation (Daleke 
et al., 201 2; Sysoeva et al., 201 4), we suspect that a similar divi- 
sion between targeting and substrate orientation is also present 
in the type VII system. 

The EccC ATPase is phylogenetically related to the T4 secre- 
tion system coupling proteins, typified by the VirD4 ATPase in 
Agrobacterium tumefaciens (Guglielmini et al., 2013). These pro- 
teins also bind to a C-terminal sequence on substrate proteins 
that is necessary for secretion, but the molecular interactions 
and biochemical effects of substrate binding in these systems 
is unknown (Trokter et al., 2014). In the A. tumefaciens system, 
three monomeric ATPases (VirB4, VirD4 and VirB1 1 ) are required 
for secretion of substrates by the system. It is intriguing to spec- 
ulate that these three ATPases, which all appear to serve very 
different mechanistic purposes, may carry out functions analo- 
gous to the three ATPase domains of EccC, but this hypothesis 
awaits further structural information about the assembly and 
function of EccC and of the T4 secretion ATPases. 

Targeting T7S for inhibition is an attractive antibacterial strat- 
egy, given the centrality of these systems to pathogenesis in 
M. tubercuiosis and S. aureus and their wide distribution among 
Gram-positive bacteria (Chen et al., 2010). Our work suggests 
two unexpected targets for disruption of the function of T7 secre- 
tion. First, small molecules targeted to the inactive state of 
ATPasei may stabilize its autoinhibition (Schindler et al., 2000). 
Second, the interaction pocket for the substrate is quite deep 
and may be amenable to small molecule targeting. Additionally, 
knowing the molecular determinants of signal-sequence recog- 
nition may also allow us to design improved vaccine strains, 
which export subsets of immunodominant virulence factors but 
do not cause disease. 

EXPERIMENTAL PROCEDURES 

A full description of the methods, reagents, and crystallographic statistics is 
included in the Extended Experimental Procedures. 

Mycobacterial Mutants and Secretion Assays 

The AeccCai/wrAeccCdi/w^ deletion strain was created by homologous 
recombination using specialized transducing phage, as previously described 
(Glickman et al., 2000). Complementation of the eccC null mutant was carried 
out by cloning the entire M. tuberculosis rv3870-rv3871 locus into an inte- 
grating vector containing a C-terminal flag tag and under the control of the 
predicted native promoter. Secretion assays were performed as described 
previously (Ohol et al., 2010). 



Protein Expression and Purification 

Recombinant proteins were subcloned by PCR into a pET vector system and 
expressed and purified from C41(DE3) strain E. coli using standard tech- 
niques. Details are available in the Extended Experimental Procedures. 

Electron Microscopy 

Single particles were picked from uranyl acetate-stained images and pro- 
cessed into classes containing ~60 images. Reconstructions were accom- 
plished as described in the Extended Experimental Procedures. 

Crystallization and Structure Solution 

Crystallization and structure solution are described in detail in the Extended 
Experimental Procedures. For the EccCTc{^ 99 -^ 3 ^ 5 ) structure, initial phases 
were determined with SAD phasing of a Ta 6 Br-i 2 derivative at ~7.5 A and 
were then improved by MIR phasing with Pt and Hg derivatives. An anomalous 
difference map determined by comparison to a selenomethionine derivative 
assisted model building. The other structures were solved using standard 
methods. 

Biochemical Assays 

Steady-state ATPase activity was analyzed using a continuously coupled 
assay (Kornberg and Pricer, 1951) adapted to a 96 well format. 

Fluorescence aniostropy was performed using a 5-FAM-VNRVQALLNG 
peptide interacting with the EccCtc( 199 -i 315 ) construct, as described in the 
Extended Experimental Procedures. 

For crosslinking assays, 2-5 |ag of total protein was incubated with 0.2% 
glutaraldehyde for 10 min and then quenched with 1 M Tris (pH 8.0). Dena- 
turing gels were stained with Coomassie or were western blotted. A full 
description is available in the Extended Experimental Procedures. 

Bioinformatics 

We attempted to identify all P loop ATPases in the PDB on a per chain basis 
and then analyzed the position of the Walker A lysine side chain relative to 
the position of the Walker B aspartic acid using programs designed by the 
authors. The initial list and the sorted lists described in the text are available 
in Table S4. Please see the Extended Experimental Procedures for full details. 

Genetic Interaction Studies 

Directed yeast two-hybrid studies were performed using a LacZ reporter 
system, as described previously (Champion et al., 2006). Strain names and 
additional procedures are available in the Extended Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, five 
figures, and four tables and can be found with this article online at http://dx. 
doi.org/1 0.101 6/j.cell.201 5.03.040. 
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In Brief 

Single-molecule assays reveal that 
loading of the two replicative helicase 
complexes at eukaryotic origins depends 
on two distinct mechanisms and that 
helicase-helicase interactions ensure 
their proper orientation to initiate 
bidirectional replisome assembly. 
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• Single-molecule studies of origin licensing reveal new steps 
in helicase loading 
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release direct helicase loading 
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head-to-head double hexamer 
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SUMMARY 

Loading of the ring-shaped Mcm2-7 replicative 
helicase around DNA licenses eukaryotic origins of 
replication. During loading, Cdc6, Cdt1, and the 
origin-recognition complex (ORC) assemble two het- 
erohexameric Mcm2-7 complexes into a head-to- 
head double hexamer that facilitates bidirectional 
replication initiation. Using multi-wavelength single- 
molecule fluorescence to monitor the events of heli- 
case loading, we demonstrate that double-hexamer 
formation is the result of sequential loading of 
individual Mcm2-7 complexes. Loading of each 
Mcm2-7 molecule involves the ordered association 
and dissociation of distinct Cdc6 and Cdt1 proteins. 
In contrast, one ORC molecule directs loading of 
both helicases in each double hexamer. Based 
on single-molecule FRET, arrival of the second 
Mcm2-7 results in rapid double-hexamer formation 
that anticipates Cdc6 and Cdt1 release, suggesting 
that Mcm-Mcm interactions recruit the second heli- 
case. Our findings reveal the complex protein dy- 
namics that coordinate helicase loading and indicate 
that distinct mechanisms load the oppositely ori- 
ented helicases that are central to bidirectional repli- 
cation initiation. 



INTRODUCTION 

Eukaryotic DNA replication must occur faithfully each cell cycle 
to maintain genomic stability. Initiation of replication occurs at 
genomic sites called origins. To ensure that no origin initiates 
replication more than once per cell cycle, the cell restricts the 
DNA loading and activation of the Mcm2-7 replicative helicase 
to distinct cell-cycle stages (Siddiqui et al., 2013). Importantly, 
helicase loading (also known as pre-RC formation) licenses ori- 
gins of replication by establishing the correct architecture for 
helicase activation and bidirectional replication initiation. 

Three helicase-loading proteins direct Mcm2-7 loading: the 
origin recognition complex (ORC), Cdc6, and Cdtl (reviewed in 
Yardimci and Walter, 2014). ORC binds origins of replication 
and recruits Cdc6 at the M/G1 transition. Cdc6-bound ORC re- 
cruits Mcm2-7 in complex with Cdtl to origin DNA. In an ATP- 
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hydrolysis-dependent reaction, recruited Mcm2-7 complexes 
are loaded around the origin DNA (Coster et al., 2014; Kang 
et al., 2014). Helicase loading requires opening and closing of 
the toroidal Mcm2-7 ring between the Mcm2 and Mcm5 sub- 
units (Bochman and Schwacha, 2008; Costa et al., 201 1 ; Samel 
et al., 201 4). The product of helicase loading is a pair of tightly in- 
teracting Mcm2-7 complexes that encircle the double-stranded 
origin DNA in a head-to-head conformation, with staggered 
Mcm2/5 gates (Costa et al., 2014; Evrin et al., 2009; Remus 
et al., 2009; Sun et al., 2014). 

Although the structure of the double-hexamer product of heli- 
case loading is clear, important questions remain about how the 
helicase-loading proteins achieve this outcome. In particular, the 
mechanisms that load the first and second Mcm2-7 complex in 
opposite orientations are unclear (reviewed in Yardimci and Wal- 
ter, 2014). Do the two Mcm2-7 complexes associate and load 
simultaneously or in an ordered fashion? Do the same or different 
CRC and Cdc6 proteins load each Mcm2-7 complex? To 
address these questions, we have developed single-molecule 
assays to monitor helicase loading. 

Single-molecule studies are a powerful tool to address ques- 
tions of stoichiometry and dynamics during DNA replication 
events. Studies of this type have led to important insights 
including the dynamics and number of DNA polymerases acting 
at the replication fork (reviewed in Stratmann and van Oijen, 
2014). Extending these approaches to replication initiation has 
the potential for additional discovery. Unlike current ensemble 
helicase loading assays, which can only detect events that sur- 
vive multiple washes, single-molecule approaches readily detect 
short-lived interactions during cycles of enzymatic function. 
Single-molecule approaches also allow stoichiometric determi- 
nations that are difficult with ensemble helicase loading assays 
due to DNA-to-DNA asynchrony and heterogeneity. Finally, 
although multi-step reactions are frequently asynchronous, 
post hoc synchronization of single-molecule data allows precise 
kinetic analysis of reaction pathways. 

We have developed single-molecule assays that monitor the 
DNA association of eukaryotic helicase-loading proteins using 
colocalization single-molecule spectroscopy (CoSMoS) (Fried- 
man et al., 2006; Hoskins et al., 2011). We show that the two 
Mcm2-7 hexamers are recruited and loaded in separate events 
that require distinct Cdc6 and Cdtl molecules. In contrast, one 
ORC molecule directs loading of both Mcm2-7 complexes pre- 
sent in a double hexamer. Consistent with distinct mechanisms 
loading the two hexamers, we observe kinetic differences be- 
tween events associated with loading the first and second 
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helicase. By combining CoSMoS with fluorescence resonance 
energy transfer (FRET), we demonstrate that formation of the 
Mcm2-7 double-hexamer interface precedes dissociation of 
Cdc6 and Cdt1 , suggesting interactions with the first Mcm2-7, 
rather than ORC, drive recruitment of the second helicase. Our 
observations reveal both the complex protein coordination 
required to assemble Mcm2-7 double hexamers and the mech- 
anisms ensuring the two Mcm2-7 molecules are loaded in 
the opposite orientations required for bidirectional replication 
initiation. 

RESULTS 

A Single-Molecule Assay for Helicase Loading 

To develop a single-molecule assay for eukaryotic helicase 
loading, we used CoSMoS to monitor the origin-DNA association 
of the proteins required for this process (ORC, Cdc6, Cdt1, 
Mcm2-7). First, we immobilized origin-containing DNA by 
coupling it to microscope slides. We determined the location 
of surface-attached DNA on the slide using a DNA-coupled fluo- 
rophore (Figure 1A). We monitored associations of one or two 
proteins (labeled with distinguishable fluorophores) with origin 
DNA using colocalization of the protein- and DNA-associated 
fluorophores (Figure S1A). Fluorescent labeling of ORC, Cdc6, 
Cdt1, and Mcm2-7 was accomplished using a SNAP-tag or 
sortase-mediated coupling of fluorescent peptides (Gendreizig 
et al., 2003; Popp et al., 2007). In each case, the fluorescent 
tags did not interfere with protein function in ensemble heli- 
case-loading reactions (Figure S1 B). After imaging the locations 
of slide-coupled DNA molecules, purified ORC, Cdc6, and Cdt1/ 
Mcm2-7 were added (one or two of which were fluorescently 
labeled), and the location of each DNA molecule was continu- 
ously monitored for labeled protein colocalization in 1-s intervals 
for 20 min. 

Multiple observations indicated that Mcm2-7-DNA colocaliza- 
tions represented events of helicase loading (Table S1 ; Movies 
S1 , S2, and S3). First, colocalizations of Mcm2-7 with the DNA 
were dramatically reduced in the absence of ORC or Cdc6, pro- 
teins required for helicase loading (Yardimci and Walter, 2014). 
Second, stable association (>30 s) of Mcm2-7 was dependent 
on the presence of the ORC DNA binding site (the ARS- 
consensus sequence, ACS). Third, ORC, Cdc6, origin DNA, 
and ATP hydrolysis were each required to form Mcm2-7 mole- 
cules that were resistant to a high-salt wash (Table SI), a 
biochemical test for loaded helicases encircling double- 
stranded DNA (dsDNA) independently of helicase-loading pro- 
teins (Donovan et al., 1997; Randell et al., 2006). 

Mcm2-7 Association and Loading Occurs 
in a One-at-a-Time Manner 

Our initial studies monitored Mcm2-7 association with origin 
DNA. We performed CoSMoS helicase-loading experiments 
using Mcm2-7 containing SNAP-tagged Mcm4 labeled with 
549 fluorophore Figure 1) and unlabeled 

ORC, Cdc6, and Cdtl . Over the course of 20 min, we observed 
both single- and double-stepwise increases in Mcm2-7-associ- 
ated fluorescence intensity at origin DNAs (Figures 1 B and SI C). 
Mcm2-7 dwell-time distributions were multi-exponential with 



many short-lived (<30 s) and fewer longer-lived (>30 s) relative in- 
creases in fluorescent intensity, suggesting at least two distinct 
types of Mcm2-7 association with the DNA (Figure 1C). 

There are two possible explanations for the multiple stepwise 
increases in DNA-colocalized Mcm2-7-coupled fluorescence. 
The simplest interpretation of this data is that Mcm2-7 hexamers 
associate with origin DNA in a one-at-a-time manner, with multi- 
ple hexamers accumulating over time. Alternatively, it was 
possible that each increase in fluorescence was due to the simul- 
taneous association of a Mcm2-7 multimer (e.g., a pre-formed 
dimer of two Mcm2-7 hexamers). To distinguish between these 
possibilities, we used photobleaching to count the number of 
DNA-associated Mcm2-7 hexamers. To this end, we first 
observed Mcm2-7"^^^'^^^"^® associations with DNA and then 
washed the surface-tethered DNA molecules with reaction 
buffer, removing unbound proteins. Then, to promote photo- 
bleaching, we increased laser excitation power and removed 
oxygen scavengers. Comparison of the number of Mcm2- 
y4SNAP549 photobleaching steps after the wash with the number 
of association steps that accumulated before the wash showed 
no single-step increase in fluorescence before the wash resulted 
in a two-step photobleaching afterward (Figure ID, top). We 
confirmed that loss of fluorescence was due to photobleaching 
and not dissociation of Mcm2-7 by observing previously non- 
illuminated microscope fields of view. These data eliminate 
models in which multiple Mcm2-7 complexes are recruited 
simultaneously. We conclude that Mcm2-7 association occurs 
in a one-at-a-time manner. 

We next asked whether loading of salt-resistant Mcm2-7 hex- 
amers around origin DNA occurred sequentially or simulta- 
neously. We used the same photobleaching assay (described 
above) except a high-salt wash was used to remove any incom- 
pletely loaded Mcm2-7 complexes prior to photobleaching. If 
loading of both Mcm2-7 hexamers occurs simultaneously, we 
should observe only even numbers of high-salt-resistant hexam- 
ers. In contrast, if loading occurs sequentially, we should 
observe even and odd numbers of high-salt-resistant hexamers. 
At low protein concentrations, we observed both one- and two- 
step photobleaching events (Figures ID, bottom, and IE). 
Roughly half (79/160) of all single Mcm2-7-associated fluoro- 
phores that colocalized with origin DNA before the high-salt 
wash were high-salt resistant, and 67% (40/60) of the double- 
Mcm2-7-associated fluorophores were high-salt resistant. 
When we increased protein concentrations, we also observed 
DNA molecules with three and four origin-dependent, high- 
salt-resistant Mcm2-7 complexes (Figure SID), indicating that 
more than one double-hexamer loading event occurred at a 
single origin. 

We considered the possibility that the apparent colocalization 
of odd numbers of loaded Mcm2-7 complexes was due to 
incomplete fluorescent labeling of Mcm2-7. For example, a sin- 
gle salt-resistant Mcm2-7-associated fluorophore could be the 
result of loading two Mcm2-7 complexes, only one of which is 
fluorescently labeled. To address this possibility, we purified 
Mcm2-7 complexes that were labeled on two subunits with 
different fluorophores Because the 

SNAP-tag and sortase labeling approaches are independent of 
each other, we could use single-molecule imaging to determine 
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Figure 1. Mcm2-7 Hexamers Associate with and Are Loaded on DNA in a One-at-a-Time Manner 

(A) Schematic for the single-molecule helicase-loading assay. Alexa-Fluor-488-labeled (blue circle) 1 .3 kb origin DMAs were coupled to microscope slides. 
Purified ORC, Cdc6, and Cdt1/Mcm2-7 (at least one fluorescently labeled, Mcm2-7 in this illustration) were incubated with slide-coupled DNA, and colocalization 
of the fluorescently labeled protein with the DNA was monitored. 

(B) Mcm2-7 complexes sequentially associate with origin DNA. Plots display the Mcm2-7^^'^^'^^"^® fluorescence intensity recorded at two representative DNA 
molecules. Insets show fluorescence images (4 x 1 s) taken during the sequential association of first (red arrow) and second (blue arrow) Mcm2-7. 

(C) Mcm2-7 dwell times on DNA have a multiexponential distribution. Mcm2-7 dwell times were plotted as a histogram. Combined data from first and second 
Mcm2-7 associations are included; vertical axis represents the number of dwells of the specified duration per second per DNA molecule. Red bars are results 
from a separate experiment using mutant origin DNA. Inset shows the distribution of Mcm2-7 dwell times on DNA molecules as a semilogarithmic cumulative 
survival plot; only a portion of the entire plot is shown to emphasize that the distribution has at least two exponential components. 

(D) Mcm2-7 associates with DNA one at a time. The number of associations present at standard protein concentrations before a reaction buffer (top) or high-salt 
buffer (0.5 M NaCI; bottom) wash is compared to the number of fluorophores that are detected by photobleaching immediately after the wash. 

(E) Two representative traces before and after a high-salt wash and photobleaching. Reactions were washed twice with a high-salt buffer and imaged at higher 
laser power in the absence of an oxygen scavenging system until all fluorophores were photobleached. Traces of Mcm2-7"^^'^'^'^^'^® associations during the 
reaction (green) are plotted adjacent to photobleaching steps after a high-salt wash (blue). 

the efficiency of each labeling protocol (79% for SNAP and 77% phore to 95%. Using the measured labeling efficiencies, we 
for sortase). This labeling protocol also increased the proportion determined the number of high-salt-resistant Mcm2-7 com- 

of Mcm2-7 complexes that have at least one coupled fluoro- plexes with no more than one of each fluorophore that would 
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Figure 2. Distinct Cdt1 Molecules Load the 
First and the Second Mcm2-7 Hexamer 

(A) Cdtl molecules arrive with Mcm2-7 but release 

quickly after the complex arrives. A representative 
two-color recording of and 

CdtiSORT549 fluorescence at an origin-DNA loca- 
tion is shown. The baseline of the red plot 
(Mcm2-7) is shifted up relative to the green plot 
(Cdtl) throughout when two-color recordings are 
displayed together. The sequence of single-frame 
images of the Cdtl- and Mcm2-7-fluorescent 
spots illustrates the concurrent arrival of Cdtl and 
Mcm2-7. Cdtl release occurs either with (green 
arrow) or without (black arrows) concurrent 
Mcm2-7 release. 

(B) Cdtl dwell times on DNA have a multi- 
exponential distribution. Cdtl dwell times were 
plotted as a histogram. Inset shows semi- 
logarithmic cumulative survival plot as in 
Figure 1C. 

(C and D) There are two types of Cdtl release 
events. (C) Histogram shows the duration of Cdtl 
origin-DNA associations when Cdtl releases with 
Mcm2-7. The mean dwell time ±SEM is reported. 
(D) Histogram shows the duration of Cdtl origin- 
DNA associations when Cdtl releases before 
Mcm2-7. The mean dwell time ±SEM is reported. 



be expected if only double hexamers were loaded (Figure SI E, 



model II). Assays with Mcm2-7' 



-4SNAP549/7SORT649 



yielded single, 



salt-resistant fluorophores in a proportion that is inconsistent 
with this model. Instead, our data are consistent with a model 
where both single and double hexamers are loaded (in a 52:48 
ratio based on our data; Figure S1E, model I). We conclude 
that Mcm2-7 complexes are both recruited and loaded onto 
origin DNA in a sequential manner. 

Distinct Cdc6 and Cdtl Molecules Load the First and 
Second Mcm2-7 

We investigated the number of Cdtl and Cdc6 molecules 
required for helicase loading and their relative times of DNA as- 
sociation. Both proteins are essential for loading but show little 
or no association with DNA in bulk assays (Coster et al., 2014; 
Kang et al., 2014), suggesting that their protein and/or DNA as- 
sociations during helicase loading are transient. To detect these 
associations, we simultaneously monitored the binding of two 
different protein pairs labeled with distinguishable fluorophores: 
either Cdt1 with or with 

The associations of both fluorophores with 
origin DNA were monitored simultaneously, revealing relative 
times of arrival and departure for the two molecules in each pair. 

Consistent with being recruited to origins as a complex, we 
typically observed that Cdtl and Mcm2-7 associated with origin 
DNA simultaneously (Figure 2A; Figures S2A-S2C). Uncommon 
instances where Cdtl or Mcm2-7 are seen associating sepa- 



rately (Cdtl alone: 11.4%, Mcm2-7 
alone: 1 8.6%) are likely caused by incom- 
plete dye labeling of the other protein 
because the frequencies of these events 
are similar to the fractions of unlabeled Mcm2-7 or Cdtl (14% 
and 20%, respectively). Like Mcm2-7, Cdtl dwell times followed 
a multi-exponential distribution, indicating the presence of at 
least two types of Cdtl -containing complexes on the DNA (Fig- 
ure 2B). Consistent with this interpretation, we identified two 
classes of Mcm2-7/Cdt1 dwell-time and departure behaviors. 
In many instances, Cdtl and Mcm2-7 were released simulta- 
neously (i.e., within 1 s, see Figures S2B and S2C). This release 
pattern occurs whether or not the DNA molecule already had an 
associated Mcm2-7. These associations were typically short 
lived (Figure 2C) and represent non-productive binding events. 
Interestingly, these events were less frequent if the Mcm2-7/ 
Cdtl was the second (29%) rather than the first (53%) to arrive 
at the DNA. In the remaining cases, Cdtl was typically longer 
lived (Figure 2D) and was released from origin DNA by itself, leav- 
ing behind an associated Mcm2-7. Clearly, only instances when 
Cdtl is released independently of Mcm2-7 can be on the 
pathway for double-hexamer formation. Because Cdtl -associ- 
ated fluorophore photobleaching was much slower than Cdtl 
dissociation (Figure S2D; Table S2), nearly all loss of fluorescent 
colocalization was due to dissociations, not photobleaching. 

Like Cdtl, Cdc6 association with the DNA is dynamic with 
distinct molecules acting during loading of the first and second 
Mcm2-7 (Figure 3A; Figure S3A). Simultaneous analysis of 
Mcm2-7 and Cdc6 DNA association showed short Cdc6- 
DNA associations (mean lifetime 27.8 ± 1 .5 s; Figure S3B), a 
subset of which directed Mcm2-7 recruitment (35.8%, n = 514; 
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Figure 3. Distinct Cdc6 Molecules Recruit 
and Load the First and the Second 
Mcm2-7 Hexamer 

(A) Distinct Cdc6 moiecuies anticipate each 
Mcm2-7 association. A representative fluores- 
cence intensity record for and 
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at origin DNA. Images of the Cdc6- 
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and Mcm2-7-associated fluorescent spots show 
Cdc6 binds before the arrival of the first Mcm2-7 
complex. 

(B) Cdc6 association anticipates binding of the first 
and second Mcm2-7. Full histogram (top) and 
expanded view (bottom) of Mcm2-7 arrival time 
minus the closest Cdc6 arrival time on the same 
DNA molecule (blue bars). Data are separated into 
Mcm2-7 complexes arriving at the DNA first (left) 
or second (right). In >85% of the observations the 
difference was greater than zero, indicating that 
Cdc6 arrived before Mcm2-7; in the remaining 
<15%, Mcm2-7 arrived before Cdc6 (likely due to 
an unlabeled Cdc6 molecule). Red bars show a 
control analysis in which each Mcm2-7 arrival time 
was paired with the closest Cdc6 arrival time on a 
different, randomly selected DNA molecule. The 
randomized control does not show the prominent 
peak at differences between 0 and +50 s indicating 
the sequential association of Cdc6 and Mcm2-7 
was not coincidental. 



question: (1) we performed experiments 
in which Cdc6 and Cdtl were labeled 
with different fluorophores, and (2) we 
compared the times of Cdc6 and Cdtl 
release relative to the time of the corre- 
sponding Mcm2-7 association in the 
previously described double-labeled ex- 
periments with either 
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Figures 3A and S3A). Cdc6 consistently anticipated Mcm2-7 
arrival at the DNA (>85%; Figures 3A and S3A). The remaining 
cases likely reflected the action of unlabeled Cdc6. We observed 
distinct Cdc6 proteins direct recruitment of the first and second 
Mcm2-7 with a similar rate constant (Figure S3C). The high fre- 
quency of Cdc6 DNA associations led us to test and confirm 
that sequential binding of Cdc6 and Mcm2-7 was not coinci- 
dental for either Mcm2-7 loading event (Figure 3B). 

Release of Cdc6 and Cdtl Is Sequential during Helicase 
Loading 

We next asked whether helicase loading led to a defined order of 
Cdc6 and Cdtl release. We took two approaches to address this 



When Cdc6 and Cdtl were labeled in 
the same experiment, we consistently 
saw Cdc6 associating with and releasing 
from origin DNA before Cdtl (Figure 4A; 
Figure S4A). Because only non-produc- 
tive Cdtl -DNA interactions had dwell 
times less than 6 s (see Figure 2C), we 
excluded these molecules from our anal- 
ysis. is released before 

Cdt1 in >95% of cases when Cdt1 and Cdc6 were co- 

localized on a DNA (Figure 4B). When the fluorophores coupled 
to the proteins were swapped and Cdt1®°™®), 

>90% of observations showed Cdc6 dissociates from origin 
DNA before Cdtl (Figure S4B). This lower percentage is likely 
due to the higher photobleaching rate of the 649 dye (Table 
S2). These results suggest that Cdc6 is released prior to Cdtl 
during helicase loading. 

Because Mcm2-7 was unlabeled in the previous experiments, 
we did not know which of the Cdc6-Cdt1 DNA co-localization 
events directed double-hexamer formation. To address whether 
Cdc6 is released before Cdtl during double-hexamer formation, 
we analyzed the time that each Cdc6 or Cdtl spent on the DNA 
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with Mcm2-7. Consistent with the Cdc6-Cdt1 double-labeling 
experiments, the average time between Mcm2-7 arrival and 
Cdc6 release is significantly shorter than the corresponding 
time before Cdt1 release (Figure 4C). Both the Cdc6^°^™® 
and Cdt1 release times are >50-fold shorter than the fluo- 

rescent dye lifetimes calculated from photobleaching rates 
(Table S2), verifying that these are dissociation events and not 
due to photobleaching. We conclude that each Mcm2-7 loading 
event is associated with the ordered release of Cdc6 followed by 
Cdt1 from the DNA. 

Kinetic Evidence for Distinct Mechanisms Loading the 
First and Second Helicase 

We reasoned that if loading of the first and second helicases 
occurred by different mechanisms, the time that Cdc6 and 
Cdt1 would spend associated with the first versus the second 
Mcm2-7 would differ. The resulting survival curves showed de- 



arrival and departure of Cdc6 before Cdt1 . 

(B) Release of Cdc6 anticipates Cdt1 release in a 
majority of cases. Time of Cdt1 release (y axis) is 
plotted against time of Cdc6 release (x axis, both 
times are measured from start of simultaneous 
presence of Cdc6 and Cdt1). The red line repre- 
sents where points would fall if Cdc6 and Cdt1 
released simultaneously. The fraction of mea- 
surements in which Cdc6 is released before Cdt1 
is reported. 

(C) Release of Cdc6 occurs before release of Cdt1 

during double hexamer formation. Survival func- 
tion for and Cdtl dwell times 

after the first or second Mcm2-7 associates with 
origin DNA. The y axis represents the fraction of 
Cdc6 or Cdt1 molecules that are still associated 
after the time represented on the x axis. 

(D and E) Cdc6 and Cdt1 release events are slower 
for the second versus the first Mcm2-7 loading 
events. (D) The time of Cdc6 release after Mcm2-7 
association is plotted for the first (blue) and second 
(red) Mcm2-7 association as a survival plot (the 
fraction of Cdc6 molecules that remain DNA 
associated is plotted against time). Inset shows 
the first 40 s of the entire plot to emphasize the 
presence of a lag prior to DNA release. Numbers 
are mean release times ±SEM for the first or 
second Mcm2-7-associated Cdc6 molecule. 
(E) Cdt1 release after the first (blue) and second 
Mcm2-7 association (red) as a survival plot as 
described for (D). 



lays between arrival of Mcm2-7 and 
release of Cdc6 or Cdt1 , suggesting that 
the release of both proteins involves mul- 
3_tarrivai ^Iple steps after Mcm2-7 recruitment. 

Although the order of Cdc6 and Cdt1 
release remained the same, we found 
that the release times were significantly 
longer for the second Mcm2-7 loading event for both Cdc6 
(p < 0.003; Figure 4D) and Cdt1 (p < 0.001 ; Figure 4E). These ki- 
netic data suggest that loading of the first and second helicase 
occurs through distinct mechanisms. 

A Single ORC Directs Formation of the Mcm2-7 Double 
Hexamer 

There are multiple models for the stoichiometry of ORC during 
helicase loading (Figure S5A). One ORC molecule could direct 
both helicase loading events (model I). Alternatively, two ORC 
molecules could be present throughout the loading reaction 
(model II). Finally, it is possible that distinct ORC molecules direct 
each loading event, but both ORC molecules are only present for 
the second loading event (model III), or, like Cdc6 and Cdtl , each 
ORC is only present during loading of one Mcm2-7 (model IV). To 
distinguish between these models, we performed CoSMoS with 
simultaneous labeling of ORC and Mcm2-7. 
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Figure 5. A Single ORC Complex Directs Recruitment and Loading of the First and Second Mcm2-7 Hexamer 

(A) Representative fluorescence intensity record for and at an origin-DNA location. Association of first and second Mcm2-7 are 

marked with red and blue arrows, respectively. 

(B) A single ORC complex directs recruitment of two hexamers. The fraction (±SE) of DNA molecules observed to have zero, one or two ORC fluorophores bound 
when the second Mcm2-7 was recruited, is plotted (bars) together with the predicted number of associated fluorophores (red and blue squares) of different 
models (see Figure S5A). 

(0) ORC is released rapidly after recruitment of the second Mcm2-7 hexamer. Histograms showing the time between the association of the second Mcm2-7 and 
ORC release (top) or association of the first Mcm2-7 and ORC release (bottom). 

(D) Release of Cdc6^°^^^"^® (blue), Cdt1^°^^^"^® (red), and (black) after the association of the second complex is plotted as a 

survival function. There are two ORC molecules that associate for >400 s (1 ,033.8 s and 709.6 s) that are not shown and disproportionately affect the mean dwell 
time. Gray lines represent a 95% confidence interval for the ORC data set showing that there is no significant difference between Cdtl and ORC release time 
distributions. Numbers in parentheses represent the mean release times ±SEM. 



Initially, we fluorescently labeled ORC on the Orel subunit 
(ORC^^°^™®) and observed associations with DNA in the pres- 
ence of unlabeled Cdc6, Cdtl , and Mcm2-7. ORC DNA binding 
showed a broad distribution of dwell times (Figure S5B, left 
panel). Consistent with the long-lived associations reflecting 
ORC binding to the ACS, mutation of this element resulted 
in >94% of ORC DNA associations being short lived (<10 s; Fig- 
ure S5B, right panel). The associations of ORC are shorter than 
the calculated fluorescent dye lifetimes confirming that we are 
observing dissociations and not photobleaching (Figure S5C; 
Table S2). 

To identify ORC molecules involved in helicase loading, we 
simultaneously monitored ORC and Mcm2-7 DNA associations 
(Figure 5A). As expected, ORC associates with DNA and Cdtl/ 
Mcm2-7. Unlike Cdc6 and Cdtl, we consistently observed a 
single increase in ORC fluorescence that remained present 
continuously during recruitment of the first and second 
Mcm2-7 complexes (Figures 5A and S5D). 



Because ORC multimers have been detected (Sun et al., 
2012), we addressed whether ORC complexes dimerize in 
solution prior to DNA binding by counting the number of photo- 
bleaching steps associated with single increases in ORC-associ- 
ated fluorescence (as was described for Mcm2-7). The large 
majority of cases were consistent with ORC binding as a single 
complex (67 of 69; Figure S5E). These data confirmed that 
the single increases in ORC-associated fluorescence were due 
to single ORC molecules associating with origin DNA during 
loading. 

Although the majority of observations involved a single ORC 
directing loading of two Mcm2-7 hexamers, occasionally we 
observed the presence of multiple DNA-bound ORC molecules 
at the time of a Mcm2-7 association. To address which models 
for ORC function during helicase loading were possible, we 
counted the number of DNA-associated ORC molecules (by 
counting stepwise increases in ORC fluorescence) during the 
second Mcm2-7 hexamer association (Figure 5B). Models II 
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Figure 6. Double-Hexamer Formation Occurs Quickly upon Recruitment of the Second Mcm2-7 Hexamer 

(A) When the two fluorophores (green circle = Dy549, red circle = Dy649) are not associated, excitation of the donor (D ex) will only yield emission from the donor 
(D em). However, when the two fluorophores are in close proximity, we observe acceptor emission (A em) upon D ex, and a weaker D em signal. Wavelengths 
represent laser excitation and emissions that were monitored. 

(B) Representative fluorescence records for experiments using a mixture of and showing FRET upon arrival of the second 

Mcm2-7. Red squares highlight when Mcm2-7^^°^^®'^® associates with DMA (Mcm2-7^^°^^^'^® is already present), and blue squares highlight when FRET occurs. 
Images and records of fluorescence intensity for D ex/D em (Mcm2-7^^°^^^'^®), A ex/A em (Mcm2-7^^°^^®'^®), total emission (D ex / (D em + A em), and FRET 
(D ex/A em) are shown together with calculated Efret- 

(C) Histogram of Efret is plotted for times when a single Mcm2-7^^°^^^'^® and a single Mcm2-7^^°^^®"^® are present (blue bars) or when only Mcm2-7^^°^^^'^® is 
associated with the DMA (unfilled gray bars). The histogram displays the first ten consecutive Efret measurements after arrival of the second Mcm2-7 for 86 DMA 

(legend continued on next page) 



520 Cell 161, 513-525, April 23, 2015 ©2015 Elsevier Inc. 




Cell 



and III predict two ORC molecules bound to DNA when the sec- 
ond Mcm2-7 is recruited. In contrast to these models, we 
observed two ORC molecules associated during loading of the 
second hexamer only 5% of the time (as opposed to 70% pre- 
dicted by model II or III using the measured ORC labeling effi- 
ciency; 85%; see Extended Experimental Procedures). Instead, 
we observe a single ORC present during association of the 
second helicase 80% (96/120) of the time, very close to the per- 
centage expected if a single ORC is responsible for loading the 
second Mcm2-7 (85%). To distinguish between models I and 
IV, we asked whether the same or different ORC molecules 
directed the first and second helicase-loading events. Consis- 
tent with model I, 94% (n = 96) of observations showed a single 
ORC complex continuously present during both Mcm2-7 
recruitment events. Thus, our data indicate one ORC molecule 
directs loading of both the first and the second Mcm2-7 hex- 
amer (model I). 

Interestingly, in most traces where two Mcm2-7 associate 
with the DNA, we observed dissociation of ORC from origin 
DNA soon after binding of the second Mcm2-7 hexamer (see 
Figures 5A and S5D). Plotting the times between the association 
of the second Mcm2-7 hexamer and ORC release (Figure 5C, 
blue bars), we observed only one instance where ORC released 
from DNA in <1 5 s (1 3.1 s), followed by a short time interval (1 5- 
90 s) during which 87% of the ORC complexes were released. 
The shape of this distribution suggests that, like Cdc6 and 
Cdtl, release of ORC is a multi-step process. In contrast, a 
much broader distribution was observed when ORC release 
was measured relative to DNA association of the first Mcm2-7 
hexamer (Figure 5C, red bars), suggesting ORC release is inde- 
pendent of this event. To investigate the order of ORC release 
relative to the other helicase-loading proteins, we compared 
the distribution of ORC, Cdc6, and Cdtl dwell times after binding 
of the second Mcm2-7 complex (Figure 5D), using data from 
two-color experiments with and 549-labeled 

ORC, Cdtl, orCdc6. Photobleaching of the 549-labeled proteins 
was insignificant relative to their observed dwell times (Table S2). 
Although there is a significant difference between release of 
Cdc6 and ORC (p < 0.001), we saw no significant difference in 
the distributions of Cdtl and ORC release (Figure 5D). Thus, 
loading of the first Mcm2-7 allows ORC retention, whereas 
loading of the second Mcm2-7 appears to induce the linked 
release of ORC and the second Cdtl . 

Recruitment of a Second Mcm2-7 Results in Rapid 
Double-Hexamer Formation 

The interactions that drive recruitment of the second Mcm2-7 
remain unclear (Yardimci and Walter, 2014). To gain insight 
into this event, we used fluorescence resonance energy transfer 
(FRET)-CoSMoS (Crawford et al., 201 3) to detect the proximity of 
the Mcm7 N-terminal domains upon double-hexamer formation 
(Costa et al., 2014; Sun et al., 2014). To this end, we labeled the 



Mcm7 subunit in separate preparations of Mcm2-7 with 
either 549 donor) or 649 (Mcm2-7^^°'^'^®‘*®, 

acceptor) fluorophore (Figure 6A). When mixed in an equimolar 
ratio, the differently labeled Mcm2-7 should be in the same dou- 
ble hexamer ~50% of the time, and those molecules should 
exhibit substantial FRET efficiency (Efret) because the Mcm7 
N-terminal regions are in close proximity in the double hexamer 
(Sun et al., 2014). We alternated between 633 and 532 nm laser 
excitation to monitor both arrival of each Mcm2-7 and Efret- 
importantiy, when and Mcm2-7^®°'’™^® were 

sequentially recruited to the origin DNA (in either order), we 
observed rapid development of a high Efret -0.53 state 
(Figures 6B and 6C, blue bars; Figure S6). A second peak at 
Efret -0.02 was also observed in the absence of acceptor (Fig- 
ure 6C, unfilled gray bars) and thus represents state(s) with no 
detectable FRET. Consistent with the detected FRET signal 
occurring as a consequence of double-hexamer formation, the 
high Efret state was stable for hundreds of seconds, and 95% 
(55/58) of the complexes that exhibited Efret —0.53 were 
high-salt resistant. 

To determine when double-hexamer formation occurs relative 
to binding of the second Mcm2-7, we compared the time of 
FRET formation to the time of arrival of the second Mcm2-7 (Fig- 
ure 6D). We found the mean time between recruitment of the 
second Mcm2-7 hexamer until formation of FRET was 7.8 ± 
0.1 s. This time is significantly shorter than release of Cdc6 after 
arrival of the second Mcm2-7 hexamer (23.2 ± 1 .7 s, p < 0.001), 
indicating that formation of the N-terminal-to-N-terminal interac- 
tions anticipates, and is therefore independent of, Cdc6 and 
Cdtl release (Figure 6D). 

DISCUSSION 

By determining precise protein/DNA stoichiometry and real-time 
dynamics, the single-molecule observations of helicase loading 
described here provide important insights into this event. 
Together, our findings strongly support a model in which the first 
and second helicase are loaded by distinct mechanisms and the 
second Mcm2-7 complex is recruited through interactions with 
the first. Accordingly, we propose a new model for helicase 
loading that is consistent with our current data and is described 
below (Figure 7). 

Recruitment and Loading of Mcm2-7 Helicases Occur in 
a One-at-a-Time Manner 

Monitoring associations in real-time reveals sequential recruit- 
ment and loading of Mcm2-7 helicases to origin DNA. One-at- 
a-time recruitment is consistent with an initial complex contain- 
ing a single Mcm2-7 associated with the three helicase-loading 
proteins (Sun et al., 2013) and ensemble assays that show tem- 
poral separation of Mcm2-7 recruitment (Fernandez-Cid et al., 
2013). Recent structural observations indicate that the Mcm2/5 



molecules (the same number of molecules and time points were used for the control). Efret data below -0.5 were excluded from the plot (3/860 signal points and 
17/860 control points). 

(D) Double-hexamer formation anticipates Cdc6, Cdtl , and ORC release. Survival after the association of the second Mcm2-7 complex of the no-FRET state 
(green) and of DNA-bound CdcO^^^"^^^® (blue), (red), and (black). Mean times ±SEM until FRET increase and ORC, Cdtl , Cdc6 release 

are reported for comparison. 
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Loading 
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gates, which must open to provide DNA access to the Mcm2-7 
central channel (Samel et al., 2014), are staggered in the double 
hexamer (Costa et al., 2014; Sun et al., 2014). Concerted 
Mcm2-7 loading would require alignment of the two Mcm2/5 
gates to allow simultaneous DNA entry into the central channels 
of both hexamers. In contrast, sequential Mcm2-7 loading can 
readily accommodate the formation of a staggered-gate dou- 
ble-hexamer structure. 

Although high-salt-resistant single hexamers have been de- 
tected after artificially closing the Mcm2/5 gate (Samel et al., 
2014), previous studies have not detected single loaded (high- 
salt resistant) Mcm2-7 complexes in unperturbed helicase- 
loading reactions (Evrin et al., 2009; Kang et al., 2014; Remus 
et al., 2009). This difference may be due to the higher protein 
concentrations used in these ensemble reactions. Alternatively, 
the high-salt-resistant single hexamers may be less stable than 
the double hexamers resulting in their loss during sample 
preparation for chromatography or EM. Indeed, a higher per- 
centage of double hexamers showed high-salt resistance rela- 
tive to single hexamers (74% versus 49%; see Figure ID). The 
high-salt wash is effective in the single-molecule assay setting, 
however, as this treatment efficiently releases incompletely 
loaded Mcm2-7 formed in the absence of ATP hydrolysis (Table 
SI , ATPyS). 



were arrested at an early ATP-dependent 
step. We found that the initial ORC-Cdc6- 
Cdt1-Mcm2-7 (OCeC-iM) complex has 
two possible fates (Figure 7, left): (1) 
simultaneous release of Mcm2-7 and 
Cdtl or (2) sequential release of Cdc6 
and Cdtl with retention of Mcm2-7. The 
former is most likely the reversal of 
the initial Mcm2-7/Cdt1 association, 
whereas the latter pathway leads to 
sequential formation of OCiM and OM 
complexes and Mcm2-7 loading. Based 
on this distinction, we propose that 
release of Cdtl independent of Mcm2-7 
is coupled to successful helicase loading 
(illustrated as closing of the Mcm2/5 gate; 
Figure 7). Consistent with this hypothesis, 
treatments (e.g., ATPyS) or mutations 
(e.g., Mcm2-7 ATPase mutations. Coster 
et al., 2014; Kang et al., 2014) that lead to 
Cdtl retention prevent helicase loading. We note that other times 
of ring closure (and opening) than those illustrated in the model 
are possible. 

Electron microscopic (EM) and ensemble assays suggest the 
existence of helicase loading intermediates with ORC-Cdc6- 
Mcm2-7 (OCeM) and ORC-Cdc6-Mcm2-7-Mcm2-7 (OCeMM, 
Sun et al., 2014). Our findings suggest that the OCeM complex 
is a short-lived intermediate formed prior to recruitment of the 
second Mcm2-7/Cdt1 complex rather than being formed by 
release of Cdtl from the OCeCiM (Fernandez-Cid et al., 2013). 
We do not see evidence of an OCeMM complex during helicase 
loading, and there is no direct evidence that Cdc6 is present in 
the 2D class averages used in these studies (Sun et al., 2014). 
Given their relatively lower resolution, these studies could have 
detected either the OCeCiMM or OCiMM complexes that we 
observe (Figure 7, right). Our previous studies found an interme- 
diate with two Cdtl complexes that is not detected in the current 
studies (Takara and Bell, 2011). During efforts to reconcile these 
findings, we found the Mcm2-7 protein used in the previous 
studies contained a non-lethal mutation in the C terminus of 
Mcm3 that is predicted to inhibit Cdc6 interactions (Frigola 
et al., 201 3). We suspect that this mutant enhanced dependence 
on other interactions leading to the detection of two Cdtl 
associations. 



Ordered Release of Cdc6 and Cdtl Molecules during 
Double-Hexamer Loading 

Our studies provide insights into Cdc6 and Cdtl function during 
helicase loading. Previously, robust DNA association of these 
proteins was only observed when helicase-loading reactions 



Loading of the First and Second Mcm2-7 Occurs by 
Distinct Mechanisms 

In addition to answering a long-standing question about ORC 
function, our data indicating that one ORC molecule directs 
Mcm2-7 double-hexamer formation strongly suggest that 
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different mechanisms direct loading of the first and second 
Mcm2-7. EM studies suggest that during helicase loading 
ORC interacts with the C-terminal end of the first Mcm2-7 on 
adjacent DNA (Sun et al., 2014; 2013). Assuming this configura- 
tion, direct recruitment of the second Mcm2-7 complex by the 
same ORC would load the two Mcm2-7 molecules in a head- 
to-tail fashion (Figure S7, top). Even if ORC had a second binding 
site for Mcm2-7 on its opposite side, a similar direct interaction 
with Mcm2-7 could not load two Mcm2-7 complexes with adja- 
cent N-terminal domains (Figure S7, bottom). Further evidence in 
favor of distinct mechanisms loading the first and second 
Mcm2-7 include (1) the two loading events show different 
Cdc6, Cdtl, and ORC release kinetics; (2) Cdtl associated 
with the second loading event shows an increased propensity 
to release without Mcm2-7. 

We considered the possibility that a second ORC binds DNA in 
the opposite orientation and loads the second helicase by the 
same mechanism as the first. Several observations argue 
against this model. First, because we do not consistently detect 
a second ORC during recruitment of the second Mcm2-7, the 
average dwell time for this second ORC would have to be below 
our detection limit (~0.5 s). This limit is >1 0-fold shorter than the 
average dwell time observed for ORC on non-origin DNA (Fig- 
ure S5B). Second, in contrast to a model in which a short-lived 
second ORC directs loading of the second Mcm2-7, the Cdc6 
protein associated with loading the second Mcm2-7 is easily de- 
tected (23.2 s average dwell time; Figure 4D). Third, even at diffu- 
sion-limited binding rates the sequential association of Cdc6 and 
Mcm2-7/Cdt1 with such a short-lived ORC is improbable. 
Finally, experiments showing that soluble ORC is not required 
for helicase loading if ORC is pre-loaded onto DNA (Bowers 
et al., 2004; Fernandez-Cid et al., 2013; Duzdevich et al., 2015) 
are not consistent with a role for a short-lived second ORC. 

Recruitment of the Second Mcm2-7 

Instead of ORC and Cdc6 directly recruiting the second 
Mcm2-7/Cdt1 complex, our findings suggest that interactions 
involved in stabilizing the Mcm2-7 double hexamer mediate 
the recruitment of the second Mcm2-7/Cdt1 . We detect these 
interactions prior to Cdc6 or Cdtl release (Figure 6), suggesting 
that formation of double-hexamer interactions anticipates 
loading of the second helicase. Recent EM studies of a complex 
between one ORC and a head-to-head Mcm2-7 double hex- 
amer are consistent with this hypothesis (Sun et al., 2014). 
Because FRET is not observed immediately upon recruitment 
of the second Mcm2-7, an intervening event (e.g., a Mcm2-7 
conformational change or ATP hydrolysis) may be required to 
bring the Mcm7 N-terminal domains into close proximity. We 
do not know which parts of the Mcm2-7 N-terminal domains 
drive the proposed interactions. For simplicity, the model (Fig- 
ure 7) illustrates interactions consistent with those observed in 
EM studies of Mcm2-7 double hexamers (Costa et al., 2014; 
Sun et al., 2014). Cne argument against a model in which 
Mcm2-7 N-terminal domains drive recruitment of the second 
Mcm2-7 is the observation that a C-terminal mutation in Mcm3 
that interferes with recruitment of the first Mcm2-7 also inhibits 
recruitment of the second Mcm2-7 (Frigola et al., 2013). This 
mutant has additional defects in Mcm2-7 ATP hydrolysis, how- 



ever, which could explain a loading defect for the second 
Mcm2-7 (Coster et al., 201 4; Kang et al., 201 4; Sun et al., 201 4). 

Because purified Mcm2-7 complexes do not show affinity for 
one another in solution (Evrin et al., 2009), the first Mcm2-7 must 
be altered to facilitate interactions with a second Mcm2-7. A 
likely possibility is that CRC and Cdc6 alter the conformation 
of the first Mcm2-7 to facilitate these interactions (shown as sep- 
aration of the Mcm2/Mcm5 N-terminal regions. Sun et al., 2013). 
In support of a role for Cdc6, although we observe an CRC- 
Mcm2-7 (CM) intermediate after the first loading event, this 
complex is unable to recruit a second Mcm2-7 until a second 
Cdc6 protein associates (CCeM). 

The model for helicase loading presented here has several 
advantageous features. Because Cdc6 ATPase activity is 
required to remove incorrectly or incompletely loaded Mcm2-7 
(Coster et al., 2014; Frigola et al., 2013; Kang et al., 2014), the 
use of different Cdc6 proteins to load the first and second 
Mcm2-7 would allow each event to be evaluated separately. 
More importantly, the use of Mcm2-7 N-terminal domain inter- 
actions to recruit the second Mcm2-7 ensures the establishment 
of a head-to-head double hexamer. This conformation is the first 
step in the establishment of bidirectional replication initiation and 
could be essential for initial DNA melting. Finally, the retention of 
CRC after the first loading event coupled with the release of CRC 
after the second loading event has the advantage of promoting 
the formation of double hexamers while inhibiting repeated 
loading of single hexamers. 

EXPERIMENTAL PROCEDURES 
Protein Purification and Labeiing 

Wild-type Mcm2-7/Cdt1 and ORC complexes were purified as described pre- 
viously (Kang et al., 201 4). Wild-type Cdc6 was purified as described in Frigola 
et al. (2013). We used a variety of protein fusions to fluorescently label ORC 
(Ubiquitin-GGG-Flag at the N terminus of Orel), Cdc6 (GST-SUMO-GGG 
tag at the N terminus), and Cdt1/Mcm2-7 (Ubiquitin-GGG-Flag at the N termi- 
nus of Mcm7 or Cdtl , and/or a SNAP-tag (NEB) at the N terminus of Mcm4). 
The Ubiquitin (in vivo) and GST-SUMO (using Ulpl protease) fusions were 
removed to reveal three N-terminal glycines required for sortase labeling. 
Sortase was used to couple fluorescently labeled peptide (DY549P1- or 
DY649P1-CHHHHHHHHHLPETGG; referred to as SORT549 and SORT649, 
respectively) to the N terminus of these proteins. SNAP-Surface549 (NEB, 
SNAP549) or SNAP-Janelia Fluor 646 (SNAPJF646; Grimm et al., 2015) was 
coupled to SNAP-tagged Mcm2-7 (See Extended Experimental Procedures 
for these purification protocols). For sortase labeling, peptide-coupled pro- 
teins were separated from uncoupled proteins using Complete-Flis-Tag Resin 
(Roche). See Extended Experimental Procedures for these purification proto- 
cols. Yeast strains and plasmids used are listed in Tables S3 and S4, 
respectively. 

Single-Molecule Microscopy 

The micro-mirror total internal reflection (TIR) microscope used for multi-wave- 
length single-molecule using excitation wavelengths 488, 532, and 633 nm has 
been previously described (Friedman and Gelles, 2012; Friedman et al., 2006). 
Biotinylated Alexa-Fluor-488-labeled, 1 .3-kb-long DNA molecules containing 
an origin were coupled to the surface of a reaction chamber through strepta- 
vidin. Briefly, the chamber surface was cleaned and derivatized using a 200:1 
ratio of silane-NFIS-PEG and silane-NHS-PEG-biotin (see Extended Experi- 
mental Procedures). We identified DNA molecule locations by acquiring four 
to seven images with 488 nm excitation at the beginning of the experiment. 
Unless otherwise noted, helicase loading reactions contained 0.25 nM ORC, 
1 nM Cdc6, and 2.5 nM Cdt1/Mcm2-7. Reaction buffer was as previously 
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described (Kang et al., 201 4) except without any giyceroi and with the addition 
of 2 mM dithiothreitoi, 2 mg/ml bovine serum aibumin (EMD Chemicais), and 
an oxygen scavenging system (giucose oxidase/cataiase) to minimize photo- 
bieaching (Friedman et al., 2006). After addition of protein to the DNA-coupled 
chamber, frames of 1 -s duration were acquired according to the following pro- 
tocol: (1) a single-image frame visualizing the DNA positions (488 excitation), 
(2) 60 frames monitoring both the 549 and 649 fluorophores (simultaneous 
532 and 633 excitation), and (3) a computer-controlled focus adjustment (using 
a 785-nm laser). This cycle was repeated roughly 20 times in the course of an 
experiment (~20 min). Chambers were then washed with either three chamber 
volumes of reaction buffer or two volumes of the same buffer with 0.5 M NaCI in 
place of 300 mM K-glutamate and 1 volume reaction buffer. For photobleach- 
ing, laser power(s) were increased, and one or multiple fluorophores were 
imaged simultaneously until no visible spots remained. Typically, photo- 
bleaching was also examined in a second field of view that was not imaged 
during the loading reaction. 

FRET Experiments 

The conditions for monitoring FRET were similar to the other experiments, with 
a few exceptions. Typical reactions contained 0.75 nM ORC, 3 nM Cdc6, 5 nM 
Cdt1/Mcm2-7^^°^'^^^® and 5 nM Cdt1/Mcm2-7^^°^‘^®^®. DNA was imaged 
before and immediately after adding the reaction to the slide but not 
throughout the experiment. The imaging protocol alternated between 1-s 
frames with the 532 laser on and 1 -s frames with the 633 laser on over 20- 
30 min. Apparent Efret was calculated as described (Crawford et al., 2013). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
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SUMMARY 

Transcription is a highly dynamic process. Conse- 
quently, we have developed native elongating 
transcript sequencing technology for mammalian 
chromatin (mNET-seq), which generates single- 
nucleotide resolution, nascent transcription profiles. 
Nascent RNA was detected in the active site of RNA 
polymerase II (Pol II) along with associated RNA 
processing intermediates. In particular, we detected 
5'splice site cleavage by the spliceosome, showing 
that cleaved upstream exon transcripts are associ- 
ated with Pol II CTD phosphorylated on the serine 5 
position (S5P), which is accumulated over down- 
stream exons. Also, depletion of termination factors 
substantially reduces Pol II pausing at gene ends, 
leading to termination defects. Notably, termination 
factors play an additional promoter role by restricting 
non-productive RNA synthesis in a Pol II CTD S2P- 
specific manner. Our results suggest that CTD 
phosphorylation patterns established for yeast tran- 
scription are significantly different in mammals. 
Taken together, mNET-seq provides dynamic and 
detailed snapshots of the complex events underlying 
transcription in mammals. 

INTRODUCTION 

Virtually all transcripts synthesized by RNA polymerase II (Pol II) 
from protein-coding genes are co-transcriptionally processed to 
generate the final functional mRNA (Moore and Proudfoot, 2009). 
First, a Cap structure C^®^Gppp) is added to the transcript 5' end 
soon after transcriptional initiation, which ultimately earmarks 
transcripts for efficient cytoplasmic translation. Then as the 
polymerase proceeds to elongate through the gene body (GB), 
intronic RNA, which often constitutes the majority of the primary 
transcript in mammalian genes, is removed by a splicing mech- 
anism involving the stepwise assembly of a complex set of small 
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RNA (snRNA) and associated proteins that together make up the 
spliceosome (Wahl et al., 2009). In outline, UlsnRNA-protein 
complex (UlsnRNP) identifies the intron 5' splice site (SS) as 
soon as it is transcribed by Pol II, and then on reaching the 3' 
end of the intron multiple snRNPs, U2, U4, U5, and U6 recognize 
the 3'SS and proximal intronic branch point on the nascent tran- 
script. Following reorganization of snRNP/intron interactions, the 
branch point A nucleotide carries out a 2'OH nucleophilic attack 
on the 5'SS, resulting in cleavage of the intron from the upstream 
exon. The newly formed upstream exon 3'OH then undergoes a 
second nucleophilic attack on the 3'SS, resulting in precise 
fusion of adjacent exons and release of the intron. Prior to intron 
splicing, hairpin structures embedded within some introns are 
excised by the double-strand RNA-specific microprocessor 
complex. This comprises an RNA-binding protein DGCR8 
together with the endonuclease Drosha, which facilitate release 
of pre-microRNA (miRNA) hairpins from the nascent transcript. 
These pre-miRNA go on to form cytoplasmic miRNA, which 
are critical for the translational regulation of many mRNA (Krol 
et al., 2010). Finally at gene 3' ends, a further RNA-processing re- 
action involving cleavage of the nascent transcript at a specific 
poly(A) signal (PAS) occurs. This RNA cleavage reaction is medi- 
ated by an endonuclease (CPSF73) that is part of a large multi- 
meric cleavage and polyadenylation complex. A poly(A) tail is 
then added to the mRNA 3' end, promoting rapid release of 
mRNA from the chromatin template (Proudfoot, 2011). Although 
these individual RNA-processing mechanisms are well charac- 
terized, their interconnections with transcription remain enig- 
matic. We describe in this study a method to investigate these 
interconnections, genome wide. 

The above outlined co-transcriptional pre-mRNA-processing 
reactions are precisely coordinated with the Pol II transcription 
cycle that proceeds from initiation at the transcription start site 
(TSS), leading on to elongation through the GB and ending 
with release of the mRNA at the PAS, also called the transcription 
end site (TES). Finally, termination occurs whereby Pol II sepa- 
rates from the DNA template. Both the Pol II transcription cycle 
and coupled pre-mRNA-processing reactions are orchestrated 
by a unique structural feature of Pol II. This comprises an 
extended C-terminal domain (CTD) of the large subunit (Rpbl) 
that has a heptad structure YSPTSPS repeated 52 times with 
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some variation in mammals and 26 times in budding yeast. This 
CTD is separate from the main globular enzyme, being posi- 
tioned close to the RNA exit channel. It is relatively unstructured 
(Meinhart and Cramer, 2004) and subject to extensive post- 
translational modification, especially phosphorylation of S2 
and S5 but also Y1, T4, and S7 (Heidemann et al., 2013; Hsin 
and Manley, 2012). This combined but differential CTD phos- 
phorylation is often considered to be a molecular code that 
acts to orchestrate transcription and coupled pre-mRNA pro- 
cessing. Especially in simpler eukaryotes, such as budding 
yeast, CTD S5P is correlated with TSS-associated events, 
whereas S2P is thought to correlate with TES events (Buratow- 
ski, 2009). However in the larger and more complex genes of 
mammals, this CTD code may be less clear-cut and vary be- 
tween different gene classes. 

To gain a more complete understanding of the Pol II transcrip- 
tion cycle and how this is coordinated with co-transcriptional 
pre-mRNA processing, genome-wide analysis of nascent RNA 
has been undertaken. For example, global nuclear run on- 
sequencing (GRO-seq) and precision nuclear run on-sequencing 
(PRO-seq) with modified nucleotides (Core et al., 2008; Kwak 
et al., 2013) provide a way to study Pol II profiles associated 
with nascent transcription. Similarly, 5' capped nascent RNA 
isolated from insoluble chromatin can be sequenced at high res- 
olution (3'NT-seq) (Weber et al., 201 4). These approaches gener- 
ated detailed maps of Pol II nascent transcription in mammals 
and flies, which accumulates at promoters, providing a regulated 
transition from initiation into productive elongation (Adelman and 
Lis, 2012; Core and Lis, 2008; Gilchrist et al., 2010; Rahl et al., 
2010). Precise maps of PRO-seq reads identified two different 
types of Pol II pausing at the TSS, referred to as proximal and 
distal TSS pausing. PRO-seq additionally showed Pol II accumu- 
lation near 3'SS, possibly important for the selection of active 
exons (Kwak et al., 201 3). GRO-seq has also shown a correlation 
between Pol II density and nucleosome occupancy as observed 
at the TES of many genes, suggesting a connection with tran- 
scription termination (Grosso et al., 2012). A significant limitation 
to these above nascent RNA-mapping techniques is that the 
relationship between Pol II CTD modification and nascent RNA 
was not established. 

Precise maps of Pol II nascent RNA have also been generated 
by native elongating transcript sequencing (NET-seq) in yeast 
(Churchman and Weissman, 2011). Here, endogenous Pol II is 
flag tagged by genomic integration allowing immunoprecipita- 
tion (IP) of Pol II nascent RNA complexes. However, again con- 
nections between Pol II CTD modifications and nascent RNA 
could not be determined. In contrast, we establish mammalian 
NET-seq technology (mNET-seq) using a selection of CTD phos- 
phorylation-specific Pol II antibodies to IP Pol ll-associated tran- 
scripts. In detail, we have compared low or unphosphorylated 
(unph), S2P, S5P, and total (unph+ph) CTD mNET-seq profiles 
and show that unph CTD Pol ll-nascent RNA are accumulated 
over the TSS, whereas S2P Pol ll-nascent RNA are spread 
throughout the GB and TES. Remarkably S5P profiles precisely 
correlate with active splicing on protein-coding genes. An impor- 
tant feature of our analysis is that we are able to directly detect 
the initial 5'SS cleavage step in intron splicing and can also 
observe active Drosha cleavage of pre-miRNA hairpin structures 



present in gene introns. In effect, our extensive mNET-seq data 
sets provide a “treasure trove” of detailed information on 
nascent transcription and co-transcriptional RNA processing in 
mammalian cells. 

RESULTS 

mNET-Seq Strategy 

To detect unstable nascent RNA across the human genome, we 
isolated a nuclear chromatin fraction from HeLa cells enriched in 
transcriptionally active Pol II (Pol Mo) and associated nascent 
RNA (Figure 1A) (Nojima et al., 2013). This chromatin-bound 
RNA was fragmented to 150-200 nt and ligated to adaptors for 
strand-specific paired-end deep sequencing (Figure SI A, top; 
Experimental Procedures). ChrRNA-seq detects unstable RNA, 
such as promoter upstream transcripts (PROMPTS), introns, 
and read-through transcripts (Figures ID and SIB). For mNET- 
seq, chromatin was digested with micrococcal nuclease 
(MNase) to release Pol II from insoluble chromatin. Note that 
accessible RNA will also be digested (Figure SI A, bottom; 
Experimental Procedures). Western blot analysis using Pol II 
8WG16 antibody confirmed that both phosphorylated (Pol No) 
and unphosphorylated (Pol lla) forms were released in a MNase 
dose-dependent manner (Figure IB). Nascent RNA distribution 
was also tested after cell fractionation and MNase digestion, 
by using nuclear run on (NRO) nuclei, labeled with [a-^^P] DTP 
(Figure 1C). The nucleoplasmic (Np) fraction contained long 
^^P-RNA (over 600 nt). However, after MNase digestion 
(40 U/|il), the residual chromatin pellet (P) contained RNA of 
10-600 nt, whereas the chromatin supernatant (S) had shorter 
RNA of 1 0-200 nt. This supernatant fraction was then IPed using 
Pol II 8WG1 6 antibody, which efficiently precipitated this shorter 
RNA. Although the predominant size of the IPed RNA was 20- 
45 nt, we selected a longer RNA fraction (35-100 nt) to 
obtain unique alignment with the human genome after deep 
sequencing. In this method, the Pol II complex will protect 
nascent RNA from MNase digestion. The hydroxylated 3' end 
(3'OH) of the nascent RNA corresponds to the terminal nucleo- 
tide synthesized by Pol II (Figure 1A, asterisk). The 5' end of 
the cleaved Pol ll-associated RNA is also hydroxylated after 
MNase digestion. To achieve strand-specific RNA sequencing, 
we carried out a kinase reaction on the IP beads to phosphory- 
late all nascent RNA 5' ends but leave the Pol ll-embedded 
3'OH intact (Figure SI A). Illumina adapters were then ligated 
onto gel-purified RNA, and Illumina high-throughput paired- 
end sequencing was carried out and generated ~10® reads for 
each mNET-seq sample. For library construction, we omitted 
the NRO step because the NRO reaction may perturb the native 
Pol II distribution. The above Pol II IP from MNase-treated chro- 
matin coupled with isolation and sequencing of the associated 
RNA constitutes a refined mammalian NET-seq protocol. 

Finally, libraries were prepared from two biological replicates 
of HeLa native chromatin after Pol II 8WG1 6 IP. Deep sequencing 
was conducted using a reverse sequence primer to read the 3' 
ends of the RNA insert, which corresponds to the RNA synthesis 
site in the Pol II active site (Figure 1 A). mNET-seq data aligned to 
the human genome (hg19) was compared to 8WG16 chromatin 
IP (ChIP-seq) and ChrRNA-seq as shown ior ATP5G1 , a typical 
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Figure 1. mNET-Seq Methodology 

(A) ChrRNA-seq and mNET-seq strategies. Pol II (blue) elongating complex (gray circle) and associated nascent RNA (red line) in chromatin. Orange asterisk 
depicts the 3’OH of nascent RNA. For ChrRNA-seq (top), fragmented nascent RNA is subjected to directional paired-end deep sequencing. For mNET-seq 
(bottom), DNA and RNA are digested with MNase and Pol ll-nascent RNA complex precipitated with Pol II antibody. Isolated RNA is deep sequenced, and the 3' 
end nucleotide uniquely mapped on the human genome. 

(B) Pol II release from insoluble chromatin DNA. Chromatin DNA was digested with increasing amounts of MNase. Western blot used 8WG16 Pol II antibody. 
P; pellet, S; supernatant. No and lla indicate phosphorylated and unphosphorylated Pol II. 

(C) Nascent RNA distribution in mNET-seq method. Nascent RNA was ^^P-labeled by NRO reaction. Fractionated nascent RNA are nucleoplasm (Np), chromatin 
pellet (Chr (P)) and supernatant (Chr (S)). IP was with 8WG16 Pol II antibody. 35-100 nt RNA purified from gel (red box). IPed Pol II was detected by western blot 
(bottom). 

{D) ATP5G1 mNET-seq. Two biological replicates of mNET-seq/unph using 8WG16 Pol II antibody. ChrRNA-seq shown as mNET-seq input. ChIP-seq (Pol II 
[8WG16], H3K4m3, and H3K36m3) data are from ENCODE project data sets (Consortium et al., 2012). 



example of an actively expressed gene in HeLa cells (Figure 1 D). 
A lower-resolution cluster of genes expressed at varying levels is 
also presented (Figure S1B). Note that as mNET-seq identifies 
the 3' end of transcript within the Pol II active site, TSS-associ- 
ated reads will only be detected 30 nt or beyond the exact 
TSS. Modifications of histone H3, H3K4m3 and H3K36m3, 
reflect active promoters and gene bodies, respectively. Strand- 
specific transcription activity was revealed by ChrRNA-seq. As 
expected, both replicates of mNET-seq/8WG16 (unph) display 
strong peaks at the active TSS, consistent with the previously 
described ChIP-seq/unph profiles. We therefore predict that 
this TSS-accumulated mNET-seq signal reflects Pol II pausing. 
Additionally, mNET-seq data revealed both sense and antisense 
transcription on active genes, as previously shown by GRO-seq 
and PRO-seq (Core et al., 2008; Kwak et al., 2013). 



Pol II CTD Phosphorylation-Specific Nascent RNA 
Profiles at TSS and TES 

A major benefit of our mNET-seq procedure is that it allows the 
use of different Pol II antibodies to precipitate modified Pol II- 
associated nascent transcripts. We elected to employ newly 
described specific monoclonal antibodies to detect CTD phos- 
phorylation-dependent nascent RNA profiles for S2P, S5P, and 
all CTD isoforms (Figure 2A) (Stasevich et al., 2014). We carried 
out further tests to confirm the specificity of these antibodies 
versus 8WG16. First we performed ELISA assays (Figure S2A) 
with synthetic peptides of 15 amino acids, containing two adja- 
cent heptad repeats, either singly or doubly phosphorylated on 
S2P, S5P, and S7P. As expected, 8WG16 bound with relative 
specificity to unphosphorylated or singly phosphorylated CTD 
peptides. CMA601 bound all CTD peptides with or without serine 
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Figure 2. mNET-Seq with Different Phospho-CTD Modifications 

(A) Diagram showing different Pol N antibody epitopes on CTD (Stasevich et al., 2014). 

(B) Specificity of Pol II phosphorylation released from chromatin following MNase treatment with indicated Pol II antibodies. 

(C) Meta-analyses of mNET-seq/unph-i-ph on TSS and TES of pA^ protein-coding genes (left) and histone genes (right). Read density (FPKM) of mNET-seq data 
were plotted around TSS (±0.5 kb) and TES (-0.5 k-^+3 kb). Data on pA^ and histone genes are represented as mean ± SEM. mNET-seq sense strand, blue; 
antisense strand, red. 

(D) Meta-analyses of mNET-seq on TSS and TES of pA^ protein-coding genes. Ratio of read density (FPKM) of indicated mNET-seq data to mNET-seq/ 
unph±ph data was plotted around TSS (±0.5 kb) and TES (-0.5 k-^±3 kb), unph, dark gray; S2P, blue; S5P, red. Line and shading represent mean ± SEM for each 
bin. 

(E and F) mNET-seq profiles over TSS of TARDBP (E) and TES of CDK1 (F). Read density, read per 10® sequences. 
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phosphorylation, whereas CMA602 and CMA603 bound CTD 
peptides containing S2P and S5P, respectively. We also per- 
formed IP Pol II western blots (Figure S2B) with these four 
antibodies under mNET-seq conditions and confirmed their spec- 
ificity and IP efficiency. We finally performed Pol II ChIP analysis 
on three specific genes comparing our monoclonal antibodies 
to commercial polyclonal antibodies (ab5095 [S2P] and ab5131 
[S5P], respectively) that are widely used for ChIP-seq assays 
(Perez-Lluch et al., 2011) (Figure S2C). Notably, matching ChIP 
profiles were observed for the different S2P- and S5P-specific an- 
tibodies. A potential concern with our mNET-seq protocol was 
that, as we only partially solubilize the chromatin pellet by MNase 
treatment, there may be selective release of different Pol II mod- 
ifications. However, the chromatin pellet and supernatant 
following MNase treatment gave very similar patterns of Pol No 
and Pol lla with all four antibodies arguing against selective 
release of differentially modified Pol II (Figure 2B). 

Based on previously published RNA-seq data (Lacoste et al., 
201 4), we found 1 1 ,560 (45%) RefSeq genes actively transcribed 
in HeLa cells. However, to avoid over-represention of ncRNA 
(such as rRNA, tRNA, snoRNA, and snRNA) in the mNET-seq 
meta-analysis, we excluded genes overlapping these se- 
quences. We also excluded overlapping transcription units as 
these might bias average profiles (see Extended Experimental 
Procedures). We initially looked at meta-profiles over TSS and 
TES regions for all Pol II isoforms (unph+ph antibody). As ex- 
pected, bidirectional TSS mNET-seq peaks were detected and 
a wider, mainly sense peak beyond the TES. In contrast, the his- 
tone genes gave a flatter mNET-seq profile across these short 
poly(A) minus genes and diminished TSS antisense reads 
(Figure 2C). This clearly shows the specificity of our mNET-seq 
profiles. We next analyzed meta-profiles using the CTD phos- 
pho-specific Pol II antibodies (Figure 2D). To allow cross-com- 
parison between the different antibodies, the data are presented 
as a ratio with mNET-seq reads obtained for total Pol II (unph+ph). 
Remarkably, only mNET-seq/unph gave a bidirectional TSS pro- 
file, whereas S2P and S5P show a gradual increase from low TSS 
signals to higher signals in the GB. The TES meta-profiles re- 
vealed the expected dominance of S2P. Single-gene TSS and 
TES mNET-seq profiles (TARDBP and CDK1 , respectively) 
were consistent with the meta-profiles. The marked differences 
in mNET-seq profiles observed for specific CTD phosphorylation 
were not seen for histone genes, which showed little difference 
other than higher unph reads across the genes (Figure S2D). 
Overall, mNET-seq profiles reveal remarkable CTD phosphoryla- 
tion specificity for poly(A)^ protein-coding genes. 

Exon Tethering to Pol II with CTD S5P for 
Co-Transcriptional Splicing 

The coupling of Pol II transcription to splicing is well established 
(Moore and Proudfoot, 2009). For example, altered Pol II elonga- 
tion speed can affect alternative splicing patterns (Ip et al., 201 1 ; 
Munoz et al., 2009), indicating that Pol II slows down near splice 
sites to promote spliceosome assembly. In particular, genome- 
wide analysis of nascent RNA by high-resolution tiling arrays 
in yeast showed that Pol II is paused over terminal exons but 
only for co-transcriptionally spliced genes (Carrillo Oesterreich 
et al., 2010). Additionally, precisely timed ChIP analysis in yeast 



revealed that Pol II CTD S5P accumulates over the 3'SS of 
intron-containing genes (Alexander et al., 2010). Furthermore, 
this splicing-dependent Pol II pausing requires pre-spliceosome 
assembly (Chathoth et al., 2014). 

We were interested to determine whether our mNET-seq pro- 
files reflect the co-transcriptionality of splicing, but we observed 
unexpected patterns. First, we present the mNET-seq profile of a 
specific gene, TARS, comparing the four different Pol II antibody 
profiles (Figure 3A). Surprisingly, mNET-seq/S5P selectively de- 
tected prominent exon peaks. We have reasoned that mNET- 
seq will specifically identify the nascent transcript 3'OH in the 
Pol II active site. However, as previously noted (Churchman 
and Weissman, 2011), co-precipitated spliceosomes contain 
3'OH RNA derived from splicing intermediates that also yield 
NET-seq signal. Remarkably, single-nucleotide analysis of 
TARS exon 9 reveals that the major S5P peaks exactly match 
the 5'SS (Figure 3A, lower panel). These observations suggest 
that S5P detects the initial 5'SS cleavage intermediate, indi- 
cating that spliceosome complex C is associated with Pol II 
CTD S5P. We next performed meta-analysis of mNET-seq 
comparing all four antibodies over gene regions that are co-tran- 
scriptionally spliced as judged by fused exon reads (Figure 3B). 
As for TARS, these actively spliced introns give a strong 5'SS 
S5P-specific signal indicative of co-precipitated spliceosome 
C complex. Significantly, we also detect selective accumulation 
of S5P reads over the downstream exon. Apparently, Pol II CTD 
S5P pauses over exon sequences and so allows time for the spli- 
ceosome to perform the first catalytic step. This will generate 
intronic lariats and cleaved upstream exons, which remain teth- 
ered to the downstream positioned Pol II. To further substantiate 
this mechanism, we carried out additional meta-analysis of pre- 
dicted included or excluded exons from final spliced mRNA in 
HeLa cells by analyzing total poly(A)‘" RNA-seq data (Katz 
et al., 2010). Again, we demonstrate a strong 5'SS S5P signal 
for included but not excluded exons (Figure 3C). We finally pre- 
sent mNET-seq analysis for five intronless genes that show no 
clear S5P peaks (Figure S3). 

The surprising observation that mNET-seq/S5P profiles show 
a strong 5'SS signal merited further experimental validation. We 
therefore employed the splicing inhibitor pladienolide B (Pla-B), 
which is known to inactive the SF3b sub-complex of U2 snRNP 
(Kotake et al., 2007), required for intronic branch point recogni- 
tion as a prelude to the first catalytic step of intron splicing. We 
initially confirmed the effect of Pla-B treatment on two specific 
genes (BRD2 and BZW1). First, nucleoplasmic RNA from control 
DMSO or Pla-B-treated cells was sequenced (NpRNA-seq), and 
the patterns obtained across these two genes showed a clear 
increase in intron retention (Figure 4A). This was confirmed by 
RT-PCR with specific exon primers (Figure 4B) where Pla-B 
treatment enhanced intron retention in both cases. Notably, 
mNET-seq/S5P analysis across these same two genes showed 
the usual high 5'SS peaks for the control but not Pla-B-treated 
cells (Figure 4A). To establish generality, we performed meta- 
analysis over 1,051 actively spliced introns (Figure 4C). As 
before, we saw the high 5'SS peak and enrichment of S5P reads 
over the downstream exon. Dramatically, Pla-B treatment 
eradicated the 5'SS signal and substantially reduced down- 
stream exon pausing. These results confirm that the 5'SS 
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Figure 3. Exon Tethering to Ser5-Phosphorylated Pol II Complex 

(A) TARS mNET-seq profile with different antibodies, followed by expanded view of exon 9 5'SS. S5P-dominant peaks are indicated by black arrows. 

(B) Meta-analysis of mNET-seq profiles over 3' ends (left) and 5' ends (right) of co-transcriptionally spliced exons. Single asterisk, peak at 3' end of spliced exon; 
double asterisk, accumulation of Pol II at 5' end of spliced exon. 

(C) Meta-analysis of mNET-seq data over 5'SS of included exons (orange) and excluded exons (green). 

For (B) and (C), bars represent mean ± SEM for each base. 



mNET-seq/S5P signal that we detect genome wide for spliced 
exons is indeed a bona fide splicing intermediate. 

We also studied the mutually exclusive exons 9 and 10 of 
PKM. RT-PCR and ChrRNA-seq analyses show that exon 10 is 
predominantly included in mature PKM transcripts in HeLa cells 
(Figures S4A and S4B) (David et al., 2010). Furthermore the 
mNET-seq/S5P profile gave the characteristic 5'SS signal at 
the end of exon 10 but not exon 9 of PK/V/ (Figure S4B).To exper- 
imentally manipulate this well-known case of alternative splicing, 



we performed S5P analysis on chromatin from cells with the 
splicing-regulatory protein PTBP1 depleted by siRNA treatment 
(Figure S4C), which is known to be required for the alternative 
splicing of PKM exon 10 (David et al., 2010). As shown by a 
lower-resolution and then single-nucleotide resolution mNET- 
seq profile, the 5'SS peak is reduced at the end of exon 10 but 
enhanced at the end of exon 9 after depletion of PTBP1 (Fig- 
ure S4E). Again, this splice-site switch is confirmed by RT-PCR 
analysis (Figure S4D). Overall, these data on PKM exon 9 and 
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Figure 4. Effect of Splicing Inhibition on mNET-Seq and ChrRNA-Seq Profiles 

(A) mNET-seq and NpRNA-seq on BRD2 and BZW1 from HeLa cell treated with DMSO (blue) or splicing inhibitor Pla-B (red). Green asterisks denote 5'SS peaks. 

(B) RT-PCR analysis of indicated exon splicing showing unspliced and spliced RNA products. 

(C) Meta-analysis of mNET-seq/S5P around exon 5'SS and 3'SS from DMSO (blue) and Pla-B (red) treated HeLa cells. S5P-peaks at 5' and 3' ends of spliced 
exons are shown by orange and green asterisks, respectively. Bars represent mean ± SEM for each base. 

(D) Co-transcriptional splicing model. 3'OH of upstream exon (UpEx, dark red) and RNA in Pol II catalytic site are shown as green and orange asterisks, 
respectively. 3'OH of the UpEX RNA is protected in S5P Pol ll-spliceosome C complex (gray circle). S5P Pol II pauses over DwEx. 
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Figure 5. Pre-miRNA Biogenesis from Protein-Coding Gene Introns 

(A-D) mNET-seq with different Poi ii antibodies versus ChrRNA-seq over intronic pre-miRNAs. (A) mNET-seq data on PANK3 with magnified view over hsa-mir- 
1 03a-1 denoted by a biack rectangie. The pre-miRNA is indicated by an orange arrow (top). Three other pre-miRNA are aiso shown: hsa-mir-27b (B), hsa-mir-26b 
(C), and hsa-mir181a/b-1 (D). Drosha cieavage sites are identified by dashed orange iines, and asterisks indicate frequent cieavage sites (5' end, purpie; 3'end, 
green). Smaii RNA-seq data are shown beiow (green). 

(E) Modei of co-transcriptionai pre-miRNA biogenesis. Pre-miRNA DNAand hairpin RNAare shown in green. Co-transcriptionai Drosha cieavage (scissors) and 
spiiceosome (gray) shown with 3' ends of cieaved RNA (purpie asterisk) and pre-miRNA (green asterisk) tethered to phosphoryiated CTD. Pre-miRNA reiease may 
occur from the transcription compiex, fast (dark red arrow) or siow (biue arrows). 



10 alternative splicing fully corroborate the general upstream 
exon-tethering pattern for actively spliced exons as demon- 
strated by our mNET-seq analysis. 

Co-Transcriptional Pre-miRNA Biogenesis 

Most pre-miRNA are present within the introns of protein-coding 
genes and are excised co-transcriptionally by the micropro- 
cessor complex, containing Drosha and DGCR8 (Morlando 



et al., 2008; Pawlicki and Steitz, 2008). Drosha cleavage gener- 
ates 3'OH ends that have the potential for mNET-seq detection. 
Because RNA cleavage sites on pre-miRNA generated by the 
microprocessor complex are quite variable, we individually 
checked the mNET-seq profiles for highly expressed pri-miRNA 
in HeLa cells. Our analysis began with PANK3, which harbors 
hsa-mir-103a-1 in its penultimate intron (Figure 5A). Its mNET- 
seq profiles show high S5P 5'SS peaks indicative of exon 
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tethering for each exon except exon 5, before the pre-miRNA- 
containing intron 5. Instead a peak is detected with S5P- and 
S2P-specific antibodies over the pre-miRNA within this intron. 
The single-nucleotide resolution profile over hsa-mir-103a-1 
(Figure 5A, bottom) shows two peaks by mNET-seq/S2P defining 
the pre-miRNA 5' and 3' ends. Notably, only the 5' end is de- 
tected by mNET-seq/S5P. Similarly hsa-mir-27b in an intron of 
C9orf3 gives S5P and S2P peaks at both ends of the pre-miRNA 
(Figure 5B). In contrast, for intronic hsa-mir-26b (CTDSP1), only a 
5' peak is detectable (Figure 5C). A further three examples of in- 
tronic pre-miRNAs show both 5' and 3' pre-miRNA peaks detect- 
able by either mNET-seq/S5P or S2P (Figures S5A-S5C). These 
specific 5' and 3' end pre-miRNA peaks correspond to the 3' 
ends of the cleaved intron and the pre-miRNA, which reaffirms 
the co-transcriptionality of pre-miRNA processing. As with spli- 
ceosomes, we suggest that microprocessor is co-precipitated 
with Pol II so that 3'OH intermediates of Drosha cleavage are de- 
tected by mNET-seq. Two pre-miRNAs (hsa-mir1 81 a-1 and hsa- 
mir181b-1) are located in the MIR181A1HG intron (Figure 5D). 
Although the ENCODE project data (Consortium et al., 2012) 
show that both mature miRNAs are expressed in HeLa cells, 
only hsa-mirl 81 a-1 yields significant mNET-seq peaks. This cor- 
relates with ChrRNA-seq analysis showing a signal window over 
hsa-mirl 81 a-1 , but not hsa-mirl 81 b-1 . We infer that only hsa- 
mirl 81 a-1 is co-transcriptionally processed. Evidently, mNET- 
seq distinguishes co-transcriptional and post-transcriptional 
pre-miRNA processing. We also note that the variable mNET- 
seq double peaks (i.e., hsa-mir-27b) and single peaks (i.e., 
hsa-mir-26b) suggest kinetic differences in pre-miRNA biogen- 
esis. Some pre-miRNAs (such as pre-miRNA-26b and 181 a-1) 
may be released immediately from the Pol II elongation complex 
after microprocessor cleavage. Other pre-miRNAs (such as pre- 
miRNA-27b and let-7g) may be more slowly released with the 3' 
ends of the pre-miRNA still tethered to the Pol II elongation com- 
plex (Figure 5E, model). Significantly, S2P and S5P generally 
show larger peaks than unph for pre-miRNA processing, sug- 
gesting that CTD phosphorylation is important for co-transcrip- 
tional pre-miRNA biogenesis. For the MIR17HG locus containing 
six tandem pre-miRNA (Figure S5D), Drosha co-transcriptionally 
cleaves the outer pre-miRNA. However, more inner pre-miR18a 
and pre-miR19a appear to be processed post-transcriptionally, 
as judged by a lack of mNET-seq peaks and the absence of a 
hole in the ChrRNA-seq profile over these sequences (Conrad 
et al.,2014). 

Pol II Pausing Regulated by CPA Factors at TES 

To establish the impact of CPA factors on mNET-seq profiles 
over and 3' to TES, we depleted CPA (CPSF73 and CstF64+ 
CstF64t) and Xrn2 by siRNA treatment (Figure S6A, left panels). 
ChrRNA-seq analyses for specific genes demonstrated clear 
Pol II termination defects after depletion of CPA factors (Fig- 
ure S6A, right panels). Double-knockdown of CstF64+CstF64t 
proteins was necessary to see a full termination defect, presum- 
ably due to their functional redundancy in HeLa cells (Yao et al., 
2012). Xrn2 knockdown showed no significant termination 
defect as suggested previously (Brannan et al., 2012). Possibly 
like CstF64, this factor acts redundantly with other termination 
factors. Interestingly, Xrn2 depletion increased transcript levels 



within the GB, suggesting a major role for Xrn2 in nuclear turn- 
over (Davidson et al., 2012). We also performed ChrRNA-seq 
analysis for histone genes (Figure S6B). Here, CPSF73 still 
showed a clear termination defect consistent with the known 
association of CPSF with the histone 3' processing machinery 
(Kolev and Steitz, 2005). In contrast, CstF64+CstF64t or Xrn2 
knockdowns showed no termination defect. Notably, loss of 
Xrn2 significantly increased histone gene reads, again indicating 
a major role in histone mRNA turnover. 

To extend our termination studies to mNET-seq, we principally 
analyzed CTD S2P profiles as these are most likely to show ef- 
fects on 3' end processing (Ahn et al., 2004; Hirose and Manley, 
1 998; McCracken et al., 1 997). However, we also performed S5P 
and unph meta-analyses in CPSF73-depleted cells. Interest- 
ingly, depletion of CPSF73 substantially reduced Pol II unph, 
S2P, and S5P pausing over the TES (Figure 6A). Similarly 
CstF64+CstF64t double-knockdown reduced TES pausing. In 
contrast, Xrn2 knockdown showed no significant difference to 
the siLuc control (Figure 6B). We also observe that S2P profiles 
upon knockdown of CPA factors crossed over the siLuc control 
profile approximately 2.5 kb downstream of the TES, reflecting 
expected transcriptional termination defects (Figures 6A and 
6B). These mNET-seq meta-analyses were complemented by 
ChrRNA-seq (Figure 6C) where meta-analysis of CPSF73 knock- 
down gave clear a termination defect immediately following the 
TES, whereas CstF64+CstF64t double-knockdown showed a 
termination defect further downstream. Again, specific genes 
are shown from both our mNET-seq and ChrRNA-seq data 
sets and show similar trends to those seen in meta-analyses 
after CPA knockdown (Figures S6C and S6D). 

3^ End Termination Machinery Regulates Levels of 
Promoter-Associated RNA 

Although RNA cleavage sites have been previously identified 
near TSS (Almada et al., 2013), which factors are involved in 
this process has not been determined. Because CPSF73 con- 
tains the endonuclease activity, it could potentially cleave 
nascent RNA near the TSS by recognition of cryptic PAS. We 
therefore performed meta-analysis across TSS using the 
mNET-seq data obtained from knockdown of CPA factors 
and Xrn2. Interestingly, we observe an equivalent increase in 
TSS-associated S2P Pol II pausing on both mRNA and PROMPT 
strands after depletion of CPA factors and Xrn2 (Figures 7 A and 
7B). Notably, this effect is specific for S2P as S5P or unph meta- 
analysis following CPSF73 knockdown did not show a change in 
TSS pausing (Figure 7A). S2P meta-analysis of CstF64+CstF64t 
double-knockdown shows an average 3.6-fold increase as 
compared to siLuc (Figure 7B, top). Also, CPSF73 and Xrn2 
knockdowns both show an average 2.3-fold increase in Pol II 
pausing (Figures 7A, middle and 7B, bottom). The extent of 
pausing varies with a more focused effect for CPSF73 and 
Xrn2 but more prolonged for CstF64+CstF64t on both mRNA 
and PROMPT strands (Figures 7A and 7B). We also present 
gene-specific examples to validate our TSS mNET-seq meta- 
analysis. FUS shows enhanced levels of TSS mNET-seq reads 
following CPSF73 knockdown, but only for S2P (Figure 7C). 
SLC30A6 also shows similar enhanced levels of TSS reads for 
S2P following each termination factor knockdown (Figure 7D). 
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Figure 6. Nascent RNA within Pol II Complex at TES 

(A) Meta-analysis of mNET-seq with indicated Pol II antibodies over TES regions (-0.5 k->+7kb) from siLuc (dark gray) and siCPSF73 (red) treated HeLa cells (left) is 
shown. Also shown are RTIs of mNET -seq following CPSF73 knockdown (right). GB signals were divided by signals in a 2 kb region from TES (TES+2k) for RTI (see 
Extended Experimental Procedures). Dashed line is median of siLuc. (**) p value < 8.52 x 10“^\ and (***) p value <2.17 x 10“^^ by two-sided Mann-Whitney test. 

(B) Meta-analysis of mNET-seq/S2P following termination factor knockdown over TES regions (top). siLuc (dark gray), siCstF64+siCstF64t (blue), and siXrn2 
(green). RTI of mNET-seq following indicated knockdown (bottom) is shown. (**) p value < 1.94 x 10“^^ by two-sided Mann-Whitney test; ns indicates no dif- 
ference between samples (p value = 0.9894 by two-sided Mann-Whitney test). 

(C) Meta-profiles of ChrRNA-seq following indicated knockdown over TES. siLuc (dark gray), siCPSF73 (red), siCstF64+siCstF64t (blue), and siXrn2 (green). 

(D) Model correlating Pol II pausing and PAS-dependent transcription termination at TES. RNA cleavage (scissors) by CPA complex (red circle) at PAS (orange 
triangle). Pol II elongation speed over 3' flank region is regulated by PAS recognition on average over a 3 kb region from TES. 

For (A)-(C), line and shading represent mean ± SEM for each bin. 
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Figure 7. Promoter-Associated RNA Turnover Regulated by Termination Factors 

(A) Meta-analysis of mNET-seq with indicated Pol II antibodies over TSS regions (-0.5k^+0.5 kb) from siLuc (dark gray) and siCPSF73 (red) treated HeLa 
cells (left). 

(B) Meta-analyses of mNET-seq/S2P following knockdown of CstF64+CstF64t (blue) and Xrn2 (green) at TSS (left). 

(C) mNET-seq oi FUS with indicated Pol II antibodies and ChrRNA-seq from sILuc (dark gray) and siCPSF73 (red) treated HeLa cells. Increased mNET-seq/S2P 
signals following depletion of CPSF73 are denoted by blue arrows. 

(D) mNET-seq/S2P maps with indicated knockdowns around TSS of SLC30A6 on both mRNA and PROMPT strands. 

(E) Model showing effects of CPA and Xrn2 at TSS. S2P Pol ll-CPA complex (red circle) cleaves TSS-associated nascent RNA, and Xrn2 (purple) degrades 
cleaved RNA from 5' end to 3' end over a region of 250 bp from TSS. 

For (A) and (B), line and shading represent mean ± SEM for each bin. 

We quantitated the effects of termination factor knockdown by Index (El). We also calculated any changes in read values across 

measuring the ratio change between mNET-seq reads over the the GB (Figure S7; Extended Experimental Procedures). The dis- 

TSS as compared to the GB; we refer to this as the Escaping tribution of El values clearly shows that depletion of all three 
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factors increases promoter-associated S2P Pol II pausing but 
has no effect on S2P Pol II distribution across the GB. These re- 
sults indicate that CPA factors and Xrn2 are involved in restrict- 
ing the levels of promoter-associated non-productive 
transcripts. 

In order to examine whether CPA factors could directly bind to 
nascent RNA near TSS, we analyzed in vivo cross-linking and 
immunoprecipitation (CLIP) data for genome-wide alternative 
polyadenylation at TES (Martin et al., 2012). Surprisingly, all 
CPA factors, including CPSF73, CstF64, CstF64t, CPSF160, 
CPSF30, and CFIm25 proteins, are significantly detected on 
both strands within 500 nt of the TSS. Especially CPSF73 shows 
a substantial peak 160 nt upstream and 80 nt downstream of 
TSS (Figure S7C and Table SI). Together with our mNET-seq/ 
S2P results, we conclude that the CPA complex cleaves not 
only pre-mRNA at the PAS to promote 3' end termination but 
also promotes promoter-associated premature termination (Fig- 
ure 7E). Notably, Xrn2 plays a unique role in TSS but not in TES 
termination. 

DISCUSSION 

Our mNET-seq analysis reveals precise maps of both nascent 
RNA and the associated Pol II “CTD code.” We employed 
recently evaluated high-affinity and specificity monoclonal anti- 
bodies to Pol II CTD S5P, S2P, unph, and unph+ph (Figure S2; 
Stasevich et al., 2014) in our mNET-seq analysis. Interestingly, 
our mNET-seq data reveal significant differences in CTD modifi- 
cation profiles across mammalian protein-coding genes as 
compared to previous studies. In particular, we detect predom- 
inantly low or unphosphorylated CTD over the TSS region (at 
least lacking S5P and S2P modification). Furthermore, most de- 
tected S5P signal is found in the GBs, where it is particularly 
associated with actively spliced exons. Finally, although we 
find that S2P signal is more associated with TES regions (consis- 
tent with previous studies), we demonstrate a redistribution of 
this CTD mark to TSS following CPA depletion. Several explana- 
tions may account for the differences between our mNET-seq 
data and previous studies. Thus, mNET-seq does not involve 
cross-linking (by formaldehyde), which is by necessity used in 
Chip analysis. The possibility that cross-linking distorts the 
native chromatin structure remains a concern. Similarly, 
mNET-seq detects nascent transcripts at single-nucleotide res- 
olution, which cannot be achieved by GRO-seq analysis. Even 
though PRO-seq analysis does give single-nucleotide resolution, 
the act of isolating nuclei, treating with sarcosyl, and then 
carrying out an in vitro transcription reaction (using modified nu- 
cleotides) as in both GRO-seq and PRO-seq protocols may 
distort the native transcription profiles of genes. Clearly, in the 
future, we can extend our analysis to include other CTD phos- 
phorylation marks using appropriate Pol II antibodies. For 
example, the CTD S7P mark is important to recruit Integrator 
complex to snRNA genes, which regulates 3' end processing 
and termination (Egloff et al., 2007). Mutation of CTD T4 specif- 
ically represses histone gene expression by blocking 3' end 
processing (Hsin et al., 2011). Another CTD modification, YIP, 
stimulates the binding of elongation factor Spt6 and blocks 
recruitment of termination factors in yeast (Mayer et al., 2012). 



It remains a possibility that the mammalian CTD code may be 
significantly different than the likely simplified code for budding 
yeast. Notably, yeast CTD has only 26 heptad repeats in its 
CTD, and these are near identical, unlike the more variable 
mammalian heptad repeats. Possibly, the high S5P TSS signals 
observed in yeast are replaced by other CTD or indeed histone 
marks in higher eukaryotes. Furthermore, few yeast genes 
possess introns so that the dominant presence of S5P marks 
over mammalian exons would be less quantitatively significant 
in yeast. Even so, it has been shown that yeast introns display 
high S5P signals near their 3' ends (Alexander et al., 201 0), simi- 
larly to the mammalian S5P splicing association, described here. 

A remarkable feature of our mNET-seq data is that we readily 
detect RNA 3' ends formed as RNA-processing intermediates 
through co-association of RNA-processing complexes with 
elongating Pol II. In particular, the Pol II CTD S5P mark, previ- 
ously thought to be mainly associated with TSS events such as 
co-transcriptional capping and early transcriptional elongation 
(Hsin and Manley, 2012), plays a major role in splicing. Thus, 
5'SS peaks of mNET-seq/S5P are detected at the end of co- 
transcriptionally spliced exons (Figure 3), indicating that the 3' 
cleaved upstream exon within the spliceosome is associated 
with Pol II elongation complexes in an S5P-dependent manner. 
We also note that the mNET-seq S5P reads are particularly 
high over spliced exons, suggesting that S5P Pol II pauses 
over functional exons allowing time for U2 snRNP-mediated acti- 
vation of 5'SS cleavage. Indeed, we demonstrate this by directly 
inhibiting U2 snRNP function (Figures 4 and S4). Overall, we find 
that the 5'SS cleavage intermediate is retained within the spli- 
ceosome C complex associated with Pol II S5P until subsequent 
ligation with the downstream exon can occur (Figure 4D, model). 
In effect, we provide genome-wide support for exon tethering to 
Pol II as previously predicted from studies on transfected gene 
constructs wherein co-transcriptional intron cleavage did not 
prevent exon splicing across a discontinuous intron (Dye et al., 
2006). We anticipate that our mNET-seq technology will provide 
new ways to unravel the complexity of the co-transcriptional 
splicing mechanism. 

A surprising aspect of our mNET-seq analysis is that we do not 
detect a peak of signal associated with pre-mRNA 3' end pro- 
cessing (TES meta-analysis in Figure 2C and Figures 6A and 
6B). This contrasts the splicing-associated 5'SS and Drosha 
cleavage sites that are highly prevalent in our data (Figures 3, 
4, and 5). We predict that 3' end cleavage (coupled with polyade- 
nylation) may cause rapid mRNA release from the Pol II complex 
and so escape mNET-seq detection, like in the pre-miRNA fast- 
release model (Figure 5E). Although 3' end processing is known 
to be required for Pol II termination (Proudfoot, 2011), it is also 
thought that Pol II pausing at TES regulates both 3' end process- 
ing and subsequent transcription termination (Gromak et al., 
2006; Nag et al., 2007). We examined the effect on Pol II pausing 
at TES following depletion of CPA components. Consistent with 
previous reports, ChrRNA-seq reveals that CPSF73 and CstF64 
depletion cause transcriptional termination defects on protein- 
coding genes (Figure 6). Interestingly, our mNET-seq data also 
reveal that depletion of CPA factors causes significantly less 
pausing immediately downstream of TES (<3 kb from TES) and 
then more Pol II occupancy at further downstream regions 
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(>3 kb from TES) compared to control cells. This indicates that 
Pol II elongation speed is regulated by the CPA complex, which 
may be an important factor in mediating transcription termination 
(Figure 6D). We also demonstrate that no significant termination 
defect occurs following the TES upon knockdown of Xrn2 (Fig- 
ure 6A, bottom). This observation contrasts our previous reports 
based on plasmid transfection studies (West et al., 2004). How- 
ever, it has been shown more recently that Xrn2 has a required 
partner protein, TTF2, for transcription termination (Brannan 
et al., 2012). It seems likely that Xrn2-associated termination is 
redundant with other termination factors. 

Unexpectedly, mNET-seq analysis showed a significant in- 
crease specifically in S2P Pol II pausing at the TSS (<250 
base) for both mRNA and PROMPT transcription upon CPA fac- 
tor and Xrn2 depletion. This suggests that CPA and Xrn2 are 
involved in premature termination at the TSS, consistent with 
a previous report (Brannan et al., 2012). Although CPA factors 
and Xrn2 affect S2P Pol II occupancy at TSS, they show no dif- 
ference in S2P Pol II distribution across the GB. Recent studies 
have pointed toward differences between promoter-proximal 
termination for mRNA sense or antisense RNA (Almada et al., 
2013; Ntini et al., 2013). Antisense TSS transcripts (PROMPTS) 
are thought to utilize cryptic PAS close to the TSS, whereas 
sense TSS transcripts may have reduced occurrence of cryptic 
PAS. Those that are present may be blocked by nearby 5'SS U1 
snRNP recruitment (Kaida et al., 2010). These apparent differ- 
ences in cryptic PAS usage between PROMPTS and sense 
TSS-associated transcripts may favor productive sense over 
non-productive antisense transcription. However, our mNET- 
seq data suggest that CPA factors and Xrn2 play equivalent 
roles in restricting sense and antisense TSS transcription. 
Thus, their depletion causes an equivalent increase in S2P 
Pol II pausing in both transcriptional directions. Also, we show 
by CLIP analysis that CPA factors are directly and equally asso- 
ciated with these two transcript classes (Martin et al., 2012). Our 
data suggest that transcriptional directionality at TSS is unlikely 
to be regulated by CPA-mediated termination. Rather both 
sense and antisense TSS-associated transcripts are restricted 
by normally TES-associated termination factors. Indeed, we 
observe a redistribution of S2P Pol II from the TES to the TSS 
following CPA factor and Xrn2 knockdown. This argues for close 
interconnections between both ends of the Pol II transcription 
unit, as previously demonstrated by 3C analysis (Ansari and 
Hampsey, 2005; O’Sullivan et al., 2004; Tan-Wong et al., 
2012). Several gene-specific analyses in mammals have re- 
ported the co-association of CPA with transcription initiation 
factors. Thus, CPSF is a known component of some TFIID com- 
plexes (Dantonel et al., 1997), and CstF has been shown to 
associate with TFIIB (Wang et al., 2010). Also, mutating the 
PAS depleted promoter-associated transcription factors and 
increased promoter-associated Pol II CTD S2P (Mapendano 
et al., 2010). Finally, the elongation factor TFIIS has been shown 
to promote release of paused TSS transcripts in Drosophila 
(Adelman et al., 2005), and this may in turn relate to CPA pro- 
moter effects. 

Overall, mNET-seq maps nascent transcription at single- 
nucleotide resolution, showing both Pol II pausing and associ- 
ated co-transcriptional RNA cleavage. Importantly, this method 



can be applied genome wide to check for modified polymerase 
occupancy (even Pol I and Pol III) by selecting a range of different 
antibodies. We anticipate that mNET-seq will expand our knowl- 
edge of how different nascent RNA are associated with specific 
“CTD codes.” 

EXPERIMENTAL PROCEDURES 
Antibodies and siRNA 

Antibodies and siRNA information are avaiiabie in the Extended Experimentai 
Procedures, in outiine, siRNA treatment was carried out for 3 days prior to ceii 
harvesting. The efficiency of protein depietion was confirmed by western biot 
with appropriate antibodies. 

Ceil Culture, NRO Assay, and RT-PCR 

Ceii cuiture and NRO assay were as previousiy described (Nojima et al., 
2013). RT-PCR and primers are described in the Extended Experimental 
Procedures. 

In Vivo Splicing Inhibition 

HeLa cells were treated with either DMSO (0.1 %) or Pla-B (1 ^iM) for 4 hr. Pla-B 
was purchased from Santa Cruz (sc-391691). 

RNA-Seq Methods 

Preparation of chromatin and nucleoplasmic RNA was previously described 
(Nojima et al., 2013). For mNET-seq, isolated chromatin was incubated with 
MNase (40 u/|al). MNase was inactivated by EGTA, and the insoluble chromatin 
removed by centrifugation. IP was performed from the supernatant using spe- 
cific Pol II antibody-conjugated beads for 1 hr. IPed RNA was 5' end phosphor- 
ylated by polynucleotide kinase treatment of the washed beads. Purified RNA 
was fractionated on denaturing acrylamide gels, and a 35-100 nt fraction was 
isolated. RNA libraries were prepared according to the manual ofTruseq small 
RNA library prep kit (lllumina). The reads were generated in Hiseq2000/2500 
(lllumina). For full methods, seethe Extended Experimental Procedures. 

Data Pre-Processing 

mNET-seq data adaptors were trimmed using Cutadapt (v1 .1) (Martin, 2011). 
The remaining paired reads were aligned to the reference human genome 
(hg19) using TopHat (v2.0.9) (Kim et al., 2013) only allowing for one alignment 
to the reference. The last nucleotide incorporated by the polymerase was 
defined as the 5' end of read two (green arrow. Figure 1A) of the pair, with 
the directionality indicated by read one (blue arrow. Figure 1A), and then the 
properly aligned read pairs were trimmed to solely keep the 5' nucleotide of 
read two. ChrRNA-seq and nucleoplasm RNA-seq data were aligned using 
the same version of TopHat but allowing for the read pairs to be separated 
by 3 kb. Further details of data pre-processing and bioinfomatic analysis are 
available in the Extended Experimental Procedures. 
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SUMMARY 

Major features of transcription by human RNA poly- 
merase II (Pol II) remain poorly defined due to a 
lack of quantitative approaches for visualizing Pol II 
progress at nucleotide resolution. We developed 
a simple and powerful approach for performing 
native elongating transcript sequencing (NET-seq) 
in human cells that globally maps strand-specific 
Pol II density at nucleotide resolution. NET-seq 
exposes a mode of antisense transcription that orig- 
inates downstream and converges on transcription 
from the canonical promoter. Convergent transcrip- 
tion is associated with a distinctive chromatin 
configuration and is characteristic of lower-ex- 
pressed genes. Integration of NET-seq with genomic 
footprinting data reveals stereotypic Pol II pausing 
coincident with transcription factor occupancy. 
Finally, exons retained in mature transcripts display 
Pol II pausing signatures that differ markedly from 
skipped exons, indicating an intrinsic capacity for 
Pol II to recognize exons with different processing 
fates. Together, human NET-seq exposes the topog- 
raphy and regulatory complexity of human gene 
expression. 

INTRODUCTION 

High-throughput sequencing analyses of transcription have 
discovered new classes of RNAs and new levels of regulatory 
complexity. Many of these results were obtained with two exper- 
imental strategies to measure RNA polymerase density genome 
wide. The first, RNA polymerase II (Pol II) ChIP-seq or ChIP-chip, 
identifies DNA bound to RNA polymerase. The second set of ap- 
proaches, global run-on sequencing (GRO-seq) and precision 
nuclear run-on and sequencing (PRO-seq), restarts RNA poly- 
merase in vitro with labeled nucleotides to purify and sequence 
nascent RNA (Core et al., 2008; Kwak et al., 201 3). GRO-seq and 

CrossMark 



Pol II Chip detect strong transcriptional pauses ~50 bp down- 
stream of many transcription start sites, demonstrating that pro- 
moter-proximal pausing is more prevalent than initially observed 
(Core et al., 2008; Krumm et al., 1992; Kwak et al., 2013; Muse 
et al., 2007; Rahl et al., 2010; Rougvie and Lis, 1988; StrobI and 
Eick, 1 992; Zeitlinger et al., 2007). Abundant unstable transcripts 
upstream of and antisense to promoters revealed that divergent 
transcription is a common feature of eukaryotic promoters (Core 
et al., 2008; Neil et al., 2009; Preker et al., 2008; Seila et al., 
2008; Xu et al., 2009). Despite progress in understanding how 
these transcripts are terminated and degraded (Almada et al., 
2013; Ntini et al., 2013; Preker et al., 2008; Schulz et al., 
2013), their roles remain unknown (Wu and Sharp, 2013). Finally, 
recent studies confirm that splicing is largely co-transcriptional 
and splicing outcome is kinetically tied to elongation rate (Bhatt 
et al., 2012; Davis-Turak et al., 2015; Dujardin et al., 2014; Fong 
et al., 201 4; Ip et al., 201 1 ; de la Mata et al., 2003; Roberts et al., 
1998; Shukla et al., 2011; Tiigner et al., 2012). However, it has 
been impossible to determine whether such kinetic coupling in 
human cells is mediated by pausing events genome wide, due 
to the high resolution required to measure pausing on short hu- 
man exons. 

The strongly stereotyped locations of promoter-proximal 
pauses and divergent antisense transcription can be exposed 
by averaging Pol II density from many genes (metagene anal- 
ysis) obtained at low resolution (Core et al., 2008; Neil et al., 
2009; Preker et al., 2008; Rahl et al., 2010; Seila et al., 2008; 
Xu et al., 2009). Yet, the precise architecture of promoter-asso- 
ciated transcriptional activity and of pausing outside of pro- 
moter regions has been obscured by the resolution limitations 
of current methodologies, preventing deeper insight into the 
underlying regulatory mechanisms. Indeed, the interplay be- 
tween chromatin structure, transcription factors, and the tran- 
scription machinery is largely undefined. Pol II ChIP-seq is typi- 
cally limited in its resolution to >200 bp resolution and lacks 
strand specificity. GRO-seq is similarly limited to ~50 bp reso- 
lution, and although PRO-seq has higher resolution, both run- 
on methods require transcription elongation complexes to 
resume polymerization in vitro, a variable process sensitive to 
the experimental conditions and the Pol II pausing state (Core 
et al., 2008; Weber et al., 2014). Recently, we showed that 
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Figure 1. A Robust and Simplified NET-Seq 
Approach for Human Cells 

(A) Schematic view of the key steps of the human 
NET-seq approach. The transcription inhibitor, 
a-amanitin, is introduced at ceii iysis and is main- 
tained through aii purification steps. Engaged RNA 
poiymerase is purified through the isoiation of 
chromatin. The 3' end of the co-purified nascent 
RNA (red) is iigated to a iinker containing a mixed 
random hexameric sequence (biue) that serves 
as a moiecuiar barcode. After cDNA synthesis, 
contaminant species are removed by hybridization. 
PCR ampiification resuits in a DNA-sequencing ii- 
brary with the sequencing primer binding site 
proximai to the random hexamer barcode. Finaiiy, 
the 3' ends of the sequenced nascent RNA are 
aiigned to the human genome, yieiding RNA poiy- 
merase density at nucieotide resoiution. Anaiysis 
of the moiecuiar barcode aiiows reads arising from 
DNA iibrary construction artifacts to be fiitered out. 

(B) Representative western biot anaiysis of ceiiuiar 
fractions in HeLa S3 ceiis. Subceiiuiar iocaiization 
markers were aiso probed (chromatin marker, 
histone 2B; nucieopiasm marker, U1 snRNP70; 
cytopiasm marker, GAPDH). 

(C) Histograms of the size-normaiized ratio of 
subceiiuiar RNA-seq reads that map to exons 
versus introns for each gene. 

(D) Number of uniqueiy aiigned reads per Poi il 
gene for two bioiogicai repiicates from HeLa S3 
ceiis (Pearson’s correiation, R = 0.97). 0.5 pseu- 
docounts were added to genes with zero counts in 
one of the repiicates. The data set with higher 
coverage was randomiy downsampied to match 
the totai number of reads of the other data set. 
See aiso Figure S1 and Tabie S1 . 



the extraordinary stability of the RNA-DNA-RNA polymerase 
ternary complex can be exploited to capture nascent RNA 
(Churchman and Weissman, 2011). Native elongating transcript 
sequencing (NET-seq) quantitatively purifies Pol II complexes 
and sequences the 3' end of nascent RNA to reveal the 
strand-specific position of Pol II with single-nucleotide resolu- 
tion. NET-seq detects all transcriptionally engaged Pol II, 
including productively transcribing Pol II, paused Pol II, and 
Pol II recovering from pausing (Churchman and Weissman, 
2011 ). 

Here, we develop a NET-seq approach that quantitatively 
defines the full spectrum of transcriptional activity in a 
strand-specific manner and at nucleotide resolution in human 
cells. We find that many promoters display antisense transcrip- 
tion downstream of a promoter-proximal pause, resulting in 
convergent sense and antisense transcriptional activities that 
face one another in close proximity. Convergent transcription 
is associated with a distinct chromatin conformation and is a 
feature of lower-expressed genes, suggesting a possible regu- 
latory role. NET-seq reveals that Pol II density profiles differ be- 
tween retained exons, skipped exons, and introns in human 
cells, indicating generalized kinetic coupling of transcription 
and splicing. Human NET-seq is readily applicable to diverse 
cell types and provides a general strategy to study transcrip- 
tional complexity. 



RESULTS 

A Robust Human NET-Seq Methodology 

The first step of NET-seq purifies nascent RNA through its tight 
interaction with RNA polymerase. In yeast, this is achieved 
through an epitope-tagged Pol II subunit that enables highly 
quantitative purification and specific elution (Churchman and 
Weissman, 2011). In adapting NET-seq to human cells, we bio- 
chemically purify >99% of all engaged RNA polymerase in a 
highly specific manner that can be applied to any mammalian 
cell line or tissue without genetic modification (Figure 1 A). This 
method avoids using Pol II antisera, which could bias the popu- 
lation of isolated Pol II complexes due to posttranslational mod- 
ifications and epitope masking by heterogeneous Pol II binding 
partners and structural conformations. Instead, human NET- 
seq exploits the high stability of the RNA-DNA-RNA polymerase 
ternary complex, even in the presence of high salt and urea (Cai 
and Luse, 1987; Wuarin and Schibler, 1994), to purify engaged 
RNA polymerase, along with its nascent RNA, through an asso- 
ciation with chromatin after cellular fractionation into cytoplasm, 
nucleoplasm, and chromatin (Bhatt et al., 2012; Pandya-Jones 
and Black, 2009; Wuarin and Schibler, 1994). To prevent tran- 
scriptional run-on during fractionation, lysate is kept at <4°C, 
and a-amanitin, a potent transcriptional inhibitor (Lindell et al., 
1970), is included in every step. Through optimization of current 
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fractionation approaches, we identified buffers and washing 
conditions that cleanly purify >99% of elongating RNA polymer- 
ase II (C-terminal domain [CTD] Ser2-P, Ser5-P, and the general 
CTD hyper-phosphorylated form of Pol II) in the chromatin frac- 
tion (see Experimental Procedures and Figures 1B and S1A). 
Western blot analyses of Pol II isoforms and factors with well- 
defined subcellular localizations verify the stringency of our frac- 
tionation conditions (Figure 1 B). 

To confirm that our purification strategy specifically isolates 
nascent RNA, we sequenced the RNA in each fraction. Unpro- 
cessed RNA species, such as intron-containing Pol II transcripts 
and spacer-containing Pol I transcripts (Figures 1 C and S1 B), are 
heavily enriched in the chromatin fraction. Importantly, the large 
majority of intron-containing RNAs observed in the nucleoplasm 
persist to the cytoplasm, indicating that these RNAs are prod- 
ucts of intron-retaining alternative splicing and not nascent tran- 
scripts (Figure S1C). Together, these results demonstrate that 
RNA polymerase and nascent RNA are quantitatively purified 
through the isolation of chromatin. 

The second step of the NET-seq approach requires 
sequencing the 3' ends of the nascent RNA, which localizes 
Pol II genome wide at nucleotide resolution (Churchman and 
Weissman, 2011; Ferrari et al., 2013; Weber et al., 2014). 
In large part, our yeast library construction protocol is used 
(Churchman and Weissman, 2012), with two important changes 
to account for the increased complexity of the human genome. 
First, we addressed reverse transcription (RT) artifacts that arise 
from the significant size of human nascent RNA. We found that 
reverse transcription frequently initiates within the RNA if there 
are stretches as short as six nucleotides of complementarity to 
the RT primer (Figure SID). When the 3' ends of the nascent 
RNA are ligated to a linker pool, consisting of a random hexamer 
at the 5' end followed by a common sequence, ligation efficiency 
increases and mispriming events are dramatically reduced (Fig- 
ure SI D). Furthermore, the hexamer serves as a molecular bar- 
code for each molecule and enables the computational removal 
of reads arising from residual mispriming events and PCR dupli- 
cates. Second, we deplete abundant mature snRNAs, snoRNAs, 
rRNA, and others through subtractive hybridization targeting 
their 3' ends (Figure S1E, Table SI) to increase sequencing 
coverage for nascent transcripts. Finally, library construction 
steps are optimized to be highly efficient (>90%) and are contin- 
ually monitored through quality controls to minimize bias. 
Together, our optimized library construction protocol faithfully 
converts the 3' ends of nascent human RNA to a DNA 
sequencing library that allows the high-fidelity mapping of 
strand-specific Pol II density. 

To observe genome-wide transcriptional activity, a NET-seq 
library was prepared from HeLa S3 cells and sequenced to 
high coverage (768 million total reads, 360 million uniquely 
aligned). Each sequencing read was aligned to the human 
genome, and the genomic location of the 3' end of the nascent 
RNA was recorded to map RNA polymerase density with nucle- 
otide resolution. As expected, we recovered nascent RNA from 
all three nuclear RNA polymerases (Pol I, Pol II, Pol III), as well as 
mature chromatin-associated RNAs, such as snRNAs, and 
splicing intermediates (Figures SI F and SI G). Here, we focused 
our analysis on Pol-ll-synthesized RNAs, but our results suggest 



that the NET-seq approach is amenable to the study of other 
RNA polymerases. Importantly, comparison of a biological repli- 
cate library (175 million total reads, 83 million uniquely aligned) 
shows strong agreement, indicating the robustness of the 
approach (Pearson’s coefficient, 0.97, Figure ID). To demon- 
strate that NET-seq is easily adaptable to other cell lines, we 
applied our approach to HEK293T cells and obtained data 
from two replicates with similar reproducibility (replicate 1: 
1.203 billion total reads, 555 million uniquely aligned; replicate 
2: 358 million total reads, 135 million uniquely aligned; Fig- 
ure S1H). From these analyses, we conclude that human 
NET-seq is capable of quantitatively monitoring transcriptional 
activity across the human genome and adaptation to new cell 
lines is straightforward. 

NET-Seq Reveals Transcriptional Activity at Nucleotide 
Resolution Genome Wide 

The resolution afforded by NET-seq and the sequencing 
coverage obtained provide an in-depth view of genome-wide 
transcriptional activity. The highest coverage is observed across 
promoter-proximal regions, which we conservatively defined as 
the region between the earliest annotated transcription start 
site and +1 kb. Within this region, >50% of genes have coverage 
of >1 read per kb per million uniquely aligned reads (RPKM) in 
both HeLa S3 (Figures 2A and 2B) and HEK293T cells (Fig- 
ure S2C). When coverage is calculated across entire genes, 
the percentage decreases to <30% in both cell lines due to the 
prevalence of promoter-proximal pausing (Figures 2A and 2B). 
Indeed, most (89% in HeLa S3 cells and 94% in HEK293T cells) 
expressed genes display promoter-proximal pausing defined by 
a traveling ratio (coverage ratio between a narrow promoter- 
proximal region and the gene body) of >2, consistent with earlier 
observations in mouse embryonic stem cells (Figures 2C and 
S2D) (Rahl et al., 2010). Furthermore, we detect unstable RNA 
production, antisense transcription upstream of many promoters 
(89% in HeLa S3 cells and 82% in HEK293T cells), transcription 
downstream of many polyadenylation sites (95% in HeLa S3 
cells and 88% in HEK293T cells), and enhancer RNAs (Figures 
S2A, S2B, S2E,and S2F). 

NET-seq data describe transcriptional activity at many length 
scales. At the single-gene level, strong signal is observed at pro- 
moter regions and across introns (Figure 2D, top and middle). 
Signal variation across the gene body suggests that transcription 
elongation is discontinuous following release from promoter- 
proximal regions and that pausing is a general feature during pro- 
ductive Pol II transcription. Near transcription start sites (TSSs), 
NET-seq detects sense and antisense transcription of divergent 
promoters at single-nucleotide resolution, revealing that pro- 
moter-proximal pausing does not occur at only one position; 
instead, there are narrow regions of high Pol II density (Figure 2D, 
bottom). Together, NET-seq data uncover key features of human 
transcription activity, and the high resolution and the coverage of 
the data provide deeper insight into these complexities. 

Widespread Convergent Transcription in Promoter- 
Proximal Regions 

Several previous studies showed widespread divergent tran- 
scription at eukaryotic promoters (Churchman and Weissman, 
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Traveling ratio (TR) 

Figure 2. NET-Seq Reports on Transcription Globally and Locally 

(A) Schematic defining gene regions used in analysis of NET-seq data. 

(B) Distributions of the percent of expressed Pol-ll-transcribed protein-coding genes (n = 1 9,1 08), with a given Pol II coverage for different gene regions as defined 
in Figure 2A. 

(C) Distributions of the percent of well-expressed Pol II protein-coding genes (n = 8,912) with a given traveling ratio. Well-expressed Pol II genes are defined as 
those genes with an RPKM of 1 or greater in a tight promoter-proximal region (-30 bp to +300 bp of the TSS). Traveling ratio (TR) is defined as the RPKM of the 
tight promoter-proximal region divided by the RPKM of the gene body region. 

(D) Number of NET-seq reads at three zoom levels around the PRPF38B locus for HeLa S3 cells. Reads that aligned to the positive strand (+) are in violet, and 
reads that aligned to the negative strand (-) are in red. The TSS and the direction of transcription are indicated by an arrow. Annotation of exonic and intronic 
regions is shown as boxes and lines, respectively. 

RPKM, reads per kb per million uniquely aligned reads at Pol-ll-transcribed genes; TSS, transcription start site; pA, polyadenylation site. See also Figure S2. 



2011; Core et al., 2008; Neil et al., 2009; Seila et al., 2008; Xu 
et al., 2009). We analyzed this phenomenon for a stringently 
defined set of Pol-ll-transcribed genes that do not overlap other 
genes within 2.5 kb of the TSS and polyadenylation site and are 
longer than 2 kb to avoid misinterpreting transcription from other 
genes as antisense transcription (n = 3,937 genes). Analysis of 
regions 2 kb upstream and downstream of transcription start 
sites with broad coverage and no sign of missing overlapping 
annotation (n = 1,488; see Experimental Procedure) reveals 
divergent transcription in 77% of promoter-proximal regions, 
consistent with other studies (Figure 3A, left) (Core et al., 2008; 
Seila et al., 2008). Surprisingly, close inspection of our data re- 
vealed an unappreciated form of antisense transcription near 
promoters. At 25% of promoter-proximal regions, we observe 
antisense transcription originating downstream of sense tran- 
scription (Figure 3A, right), which we term convergent transcrip- 
tion. Convergent transcription is clearly observed at single-pro- 
moter regions (Figures 3B and 3C), and in most cases, such as 
near the KLHL9 promoter, convergent transcription is accompa- 
nied by divergent transcription (Figure 3B). However, it also 
occurs in the absence of divergent transcription (for example. 



FAM133B, Figure 3C). Furthermore, GRO-seq also detects 
these transcripts. A re-analysis of mouse embryonic stem cell 
data reveals convergent antisense transcription (Jonkers et al., 
2014) (Figure S3A). 

To characterize the structural attributes on these modes of 
transcriptional activity, distances between sense and antisense 
peaks were determined for each promoter-proximal region (Fig- 
ure 3D). A stereotypical distance (250 ± 50 bp) separates the 
sense and antisense peaks in divergent transcription, while the 
sense and antisense peaks in convergent transcription are also 
separated by a stereotypical distance (150 bp ± 50 bp), indi- 
cating that convergent antisense transcription is not simply the 
result of spurious antisense transcription initiation events across 
the promoter-proximal region (Figure 3D). 

A Distinct Chromatin Structure Associated with 
Convergent Transcription 

Many chromatin modifiers control antisense transcription 
(Churchman and Weissman, 2011; DeGennaro et al., 2013; 
Kim et al., 2012; Marquardt et al., 2014; Whitehouse et al., 
2007), and we asked how promoter-proximal transcriptional 
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Figure 3. Convergent Transcription 
Observed at the Promoter-Proximal Re- 
gions of Lower-Expressed Genes 

(A) Promoters are classified depending on whether 
they contain a peak of convergent antisense Pol II 
transcription, as iiiustrated by the cartoon above 
the heat maps. A stringent set of promoter-prox- 
imai regions was seiected for anaiysis to ensure 
that transcription arising from other transcription 
units wouid not bias ciassification (see Experi- 
mentai Procedures). Heatmaps of Poi li scaied 
density are dispiayed for each ciass (ieft, no 
convergent peak, n = 931 genes; right, convergent 
peak, n = 373). For each gene, the sense (vioiet) 
and antisense (red) raw signai is anaiyzed sepa- 
rateiy and normaiized to vary from 0 to 1 . Both 
signais are superposed, centered on the major 
sense transcription peak. Genes are sorted by the 
distance between the sense and antisense peaks. 
Mean Poi ii density profiie is dispiayed beiow the 
heatmaps, where raw sense and antisense data 
are normaiized together to vary between 0 and 1 
and smoothed with a 50 bp siiding window 
average. Soiid iines indicate the mean vaiues, and 
shading shows the 95% confidence intervai. 
Sense transcription is shown in vioiet, and anti- 
sense transcription is shown in red. 

(B and C) Exam pies of NET-seq reads in two 
promoter-proximai regions that dispiay conver- 
gent Poi ii transcription. 

(D) Histogram of distances between the major 
peak of Poi II density in the sense direction and the 
peak(s) in the antisense direction for ail promoters 
with convergent and/or divergent peaks (n = 
1,304). 

(E) Distributions of the percentage of genes with a 
given Poi ii density in the gene body region, as 
defined in Figure 2A. Genes with oniy convergent 
transcription (yeiiow, n = 151) or oniy divergent 
transcription (biue, n = 931) in their promoter- 
proximai regions are compared. The p vaiue is 
caicuiated by the Koimogorov-Smirnov test. 
RPKM, reads per kb per miiiion uniqueiy aiigned 
reads at Poi-ii-transcribed genes. See aiso 
Figure S3. 



Gene body Pol II density (RPKM) 



activity relates to local chronnatin structure. We used DNase-seq 
to nnap regions of open chronnatin and highly positioned nucleo- 
sonnes in the sanne HeLa S3 cells used for NET-seq (Thurman 
et al., 2012). We examined the distribution of DNase I accessi- 
bility relative to promoter-proximal peaks in NET-seq data 



(Figure 4). At genes that have a sense 
Pol II peak (representing promoter-proxi- 
mally paused Pol II), we observe strong 
DNase I hypersensitivity upstream of the 
peak, determining the canonical pro- 
moter (Figure 4A), and reduced DNase 
sensitivity downstream of the peak corre- 
sponding to the -1-1 nucleosome. Thus, 
promoter-proximal pausing occurs prior 
to the -1-1 nucleosome in mammalian 
cells. Comparison of DNase I data rela- 
tive to the divergent antisense peak shows that this transcrip- 
tional activity originates from the 5' side of the promoter 
hypersensitivity region, consistent with the model that divergent 
antisense transcription is a consequence of an open chromatin 
region (Seila et al., 2009) (Figure 4C). In contrast, genes with 
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Figure 4. Convergent Transcription Is Associated with a Distinct Chromatin Structure 

(A-C) Heatmaps showing DNase I accessibility in HeLa S3 cells surrounding all (A) promoters (n = 1 ,304) aligned to the sense NET-seq peak, (B) promoters that 
have convergent transcription (n = 373) aligned to the antisense convergent NET -seq peak, and (C) promoters that have divergent transcription (n = 1 , 1 53) aligned 
to the antisense divergent peak. Below each heatmap is the mean DNase I accessibility profile of the region shown in the heatmap. The raw data are smoothed 
with a 150 bp sliding window in 20 bp steps. Solid lines indicate the trimmed mean (removing 5% of extreme data points). Above each heatmap are 
arrows showing the transcriptional activity observed in each promoter-proximal region. A cartoon displays the chromatin structure determined by analysis of the 
DNase-seq data. 



convergent transcription show two distinct peaks in DNase I 
hypersensitivity: a canonical promoter peak and a downstream 
peak located proximal to the convergent antisense peak (Fig- 
ure 4B). Thus, convergent antisense transcription likely origi- 
nates locally. Furthermore, the dip between the two peaks of 
DNase I hypersensitivity likely represents the -i-1 nucleosome, 
consistent with the ~150 bp spacing between the sense and 
convergent antisense Pol II peaks (Figures 3D and 4B). These re- 
sults indicate that convergent transcription reflects sense and 
antisense transcription that initiates locally and undergoes pro- 
moter-proximal pausing flanking the -i-1 nucleosome. 

Convergent Transcription Is a Feature of Lower- 
Expressed Genes 

Convergent transcription can regulate gene expression through 
transcriptional interference mechanisms (Callen et al., 2004; El- 
ledge and Davis, 1989; Gullerova and Proudfoot, 2012; Hobson 
et al., 2012; Martens et al., 2004; Prescott and Proudfoot, 2002; 
Shearwin et al., 2005). Thus, we considered whether promoter- 
proximal convergent transcription may be involved in release of 
Pol II from promoter-proximal pausing into productive elonga- 



tion. We compared Pol II density within the gene body (-i-1 kb 
after the transcription start site to the polyadenylation site, illus- 
trated in Figure 2A) at genes that display only convergent tran- 
scription to genes that display only divergent transcription. 
Notably, genes with only convergent transcription near their 
promoters show consistently less transcription downstream of 
their promoter regions (Figure 3E) (1.8-fold less on average, 
Kolmogorov-Smirnov test, p < 10“®). Comparison of less strin- 
gently defined sets of genes, such as all genes with convergent 
transcription to all genes without convergent transcription, 
showed a similar effect (Figure S3B). In agreement with this 
observation, analysis of ENCODE HeLa S3 ChIP-seq data re- 
veals that H3K79me2 histone marks, which correlate with tran- 
scription elongation, occur at significantly lower levels in the 
gene bodies of genes with convergent antisense transcription 
(Figure S3C) (Consortium, 2012; Guenther et al., 2007; Wozniak 
and Strahl, 2014). Thus, convergent antisense transcription 
could interfere with productive transcription elongation or could 
be a consequence of less-productive elongation. Either of these 
possibilities could be directly mediated by Pol II or by another 
factor, such as chromatin. 
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Figure 5. P-TEFb Inhibition Proportionally Affects Levels of Sense 
Transcription and Convergent Antisense Transcription 

(A) Western blot analysis of whole-cell extract of HeLa S3 cells with flavopiridol 
(FP) treatment (1 hr). The percentage at the bottom of each lane is the amount 
of the respective protein (as determined by image quantification) before and 
after FP treatment. GAPDH serves as a loading control. 

(B) Meta-gene analysis of NET-seq data from HeLa S3 cells treated with 
1 |iM FP (purple and red) or DMSO control (gray) for 1 hr. Arrows indi- 
cate regions where transcription is affected by FP treatment. Genes that 
had convergent and/or divergent peaks (described in Experimental 
Procedures) in both data sets are included in the analysis (n = 615). NET- 
seq signal from each promoter region (± 2.5 kb centered at the sense 
transcription peak) are binned into 10 bp windows. For each sample, 
sense and antisense signal are normalized together to vary between 



To test whether convergent antisense transcription is a conse- 
quence of reduced sense transcription elongation, we globally 
suppressed productive elongation by inhibiting positive tran- 
scription elongation factor b (P-TEFb). Most promoter-proxi- 
mally paused Pol II are released through recruitment of P-TEFb 
that phosphorylates multiple proteins, including Ser2 residues 
of the Pol II CTD (Kim and Sharp, 2001; Peterlin and Price, 
2006). Therefore, active P-TEFb greatly facilitates the transition 
to productive elongation but does not affect transcription initia- 
tion (Lis et al., 2000; Peterlin and Price, 2006; Rahl et al., 
2010). We performed NET-seq analysis on HeLa S3 cells 
exposed to the P-TEFb inhibitor flavopiridol (FP) (Chao and 
Price, 2001) or DMSO alone. As expected, after 60 min, FP 
reduced Pol II CTD Ser2 phosphorylation, but phosphorylation 
of Ser5 residues and overall Pol II levels remained unchanged 
(Figures 5A and S4). We generated NET-seq libraries from 
HeLa S3 cells after a 1 hr FP treatment or DMSO control (FP 
treatment NET-seq data set, 486 million total reads, 262 uniquely 
mapped reads; DMSO control NET-seq data set, 491 million to- 
tal reads, 263 million uniquely mapped). In agreement with pre- 
vious studies, we observe a global decrease in Pol II density 
outside of promoter-proximal regions compared to the DMSO 
control (Figure 5B, arrows) (Flynn et al., 2011; Jonkers et al., 
2014; Rahl et al., 2010). Thus, FP treatment reduces productive 
elongation of most genes. To quantify the effect of FP treatment 
on convergent transcription, we calculated the ratio of con- 
vergent antisense to sense transcription at all promoter-proximal 
regions. If convergent transcription were a simple consequence 
of lower expression, it should not only be increased propor- 
tionally to promoter-proximal sense transcription following FP 
treatment, but importantly, it should appear in genes where it 
was not detected before. We observe that the convergent anti- 
sense-to-sense transcription ratio remains constant following 
FP treatment, indicating that sense and convergent antisense 
transcription levels covary, and we do not detect a new subpop- 
ulation of genes with convergent transcription in their promoter- 
proximal regions (Figure 5C). This result suggests that the lack of 
sense-productive transcription elongation is not sufficient to 
induce convergent transcription. Thus, if convergent antisense 
transcription is not a simple consequence of low sense expres- 
sion, then it may contribute to the cause. 

Impact of Transcription Factor Occupancy on Pol II 
Elongation 

DNA-bound transcription factors (TFs) have the potential to 
obstruct elongating Pol II. To investigate the relationship be- 
tween TF occupancy and Pol II progress, we expanded our 



0 and 1 and then smoothed with a 50 bp sliding window. Solid lines 
indicate the mean normalized Pol II density, and shading shows the 95% 
confidence interval. 

(C) A scatterplot comparing the convergent-to-sense ratio after treatment with 
DMSO (control) and after FP treatment for a stringent subset of non-over- 
lapping genes with at least 1 0 reads across the 500 bp region after TSS in both 
samples (n = 1,667). The ratio is the sum of NET-seq signal on the antisense 
strand versus the sense strand across the 500 bp region after the TSS. The 
handful of genes with a ratio of 0 are not plotted. 

TSS, transcription start site. See also Figure S4. 
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Figure 6. Pol II Pausing Associated with Transcription Factor Occupancy 

(A) Average NET-seq signal around 16,339 CTCF motifs with accessible chromatin. In red is NET-seq signal oriented to the strand of the motif (Pol II transcription 
from left to right), and in green is NET-seq on the other strand (Pol II transcription from right to left). The CTCF binding motif is pictured above. The NET-seq data 
were smoothed by a 10 bp sliding window average. 

(B) Mean NET-seq (red) and DNase I cleavage (gray) signal (10 bp windowed averages) surrounding the top 5,000 CTCF motifs sorted by NET-seq signal. 

(C) Average NET-seq signal (smoothed by a 10 bp sliding window average) around 731 YY1 motifs with accessible chromatin. In red is NET-seq signal oriented to 
the strand of the motif (Pol II transcription from left to right), and in green is NET-seq on the other strand (Pol II transcription from right to left). The YY1 binding motif 
is pictured above. 

(D) Mean per nucleotide NET-seq (red) and DNase I cleavage (gray) signal surrounding the top 200 YY1 motifs sorted by NET-seq signal. Both signals are 
presented as 10 bp windowed averages. Schematic of nucleosome positioning relative to YY1 inferred from DNase I accessibility is above plot. 



DNase-seq data from HeLa S3 cells to genomic footprinting 
depth (269 million uniquely mapped genomic reads), enabling 
detailed mapping of the occupancy of TF recognition sites within 
DNase I hypersensitivity sites (DHSs). As CTCF is implicated in 
Pol II pausing in vitro and within the cell (Shukla et al., 2011), 
we quantified NET-seq signal and DNase-seq signal around 
CTCF recognition sites within DHSs on both strands. We 
observed higher Pol II density just upstream of the CTCF sites, 
suggesting that CTCF might represent a barrier to Pol II elonga- 
tion genome wide (Figures 6A and 6B). Interestingly, the NET- 
seq signal around these sites differs in magnitude for each 
strand, indicating that CTCF may pose strand-specific obstacles 
(Figure 6A). 

As transcriptional pausing has been seen upstream of 
nucleosomes in yeast and Drosophila cells (Churchman and 
Weissman, 2011; Mavrich et al., 2008; Weber et al., 2014), we 
investigated Pol II density around YY1, a canonical promoter- 



centric transcription factor (Xi et al., 2007) thought to position -i-l 
nucleosomes (Vierstra et al., 2014). Thus, we speculated that 
YY1 occupancy might impact Pol II elongation. Given that 
poly-zinc finger TFs engage DNA asymmetrically, we also spec- 
ulated that any impact on Pol II might also be strand specific. We 
observed a peak in NET-seq signal precisely at YY1 sites in 
DHSs, consistent with YY1 -directed pausing (Figures 6C and 
6D). Strikingly, this effect was highly directional and is predomi- 
nant when Pol II engages YY1 from the upstream direction (Fig- 
ure 6D). These results indicate that TFs might directly regulate 
Pol II elongation in direction- or strand-specific ways. 

Fine Structure of Pol II Pausing along Constitutive and 
Alternative Exons 

Alteration to transcription elongation rates affects splicing out- 
comes, which has led to the proposal of the kinetic model of tran- 
scription and splicing coupling (Dujardin et al., 2014; Fong et al.. 
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Figure 7. Pol II Density across Exons Reveals a Stereotypical Pausing Pattern that Depends on Splicing Outcome 

(A) Schematic of the classification of constitutive, alternative retained, and alternative skipped exons based on annotated isoforms and detected levels in 
cytosolic RNA-seq data. 

(B) A stringent set of exons was selected for analysis from genes containing NET-seq signal of > 1 RPKM (B, see Experimental Procedures). Heatmaps and meta- 
exon analysis of HeLa S3 Pol II density across each type of exon, as defined in (A) (left to right: constitutive exons, n = 1 ,334; alternative retained, n = 1 4,582; and 
alternative skipped, n = 6,348). NET-seq signal from each exon (± 25 bp) is normalized to vary from 0 to 1 (white to black scale in the heatmaps). Solid lines on the 
meta-exon plots indicate the mean values, and the gray shading represents the 95% confidence interval. The single-nucleotide positions where splicing in- 
termediates align (3' ends of introns and exons) were entirely removed from analysis (see Experimental Procedures) and appear as a blank position in the figures. 
(C-E) Raw NET-seq reads across the constitutive exon 2 within the DDX3X gene (C), alternative retained exon 2 within the SIK1 gene (D), and the alternative 
skipped exon 10 within the CD55 gene (E). 

(F) Distribution of the percent of exons or introns with a given Pol II density. 

RPKM, reads per kb per million Pol II uniquely aligned reads. See also Figure S5. 



201 4; Ip et al., 201 1 ; de la Mata et al., 2003; Roberts et al., 1 998). 
However, the degree to which transcription rate is modulated 
locally around exons is unclear. Higher Pol II density at human 
exons versus introns was reported using Pol II ChIP-seq and 
ChIP-chip (Brodsky et al., 2005; Schwartz et al., 2009), but in 



another study, no significant difference was observed (Spies 
et al., 2009). Furthermore, the precise pattern across individual 
exons could not be resolved. In Drosophila cells, PRO-seq 
observed high Pol II density across exons and detected a high 
enrichment of Pol II density at the 5' ends (Kwak et al., 2013). 
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We analyzed NET-seq data at constitutive exons and revealed 
significantly higher coverage than at introns in both HeLa S3 (2.4 
X higher) and HEK293T cells (2.2 x higher) (p < 10“^^, Kolmo- 
gorov-Smirnov test), suggesting that transcription elongation is 
slower at exons in human cells (Figures 7A, 7B, 7F and S5A). 
Any contamination from processed mRNA would inflate these 
differences; however, our quality controls (Figures 1B, 1C, and 
SI A) suggest that this is a small effect, if at all. Strikingly, NET- 
seq signal across exons is not uniform: sharp increases in Pol 
II density occur in the few base pairs surrounding the 5' and 3' 
ends of constitutive exons, indicating strong Pol II pausing at 
exon boundaries (Figure 7B). As splicing intermediates are 
known NET-seq contaminants (Figure S1G), we removed the 
single bp positions where they align from analysis. Furthermore, 
a broader peak of RNA polymerase density is present ~17 nt 
before the 3' end of exons. The general features of this pattern 
are observed at single exons, for example, exons 2 of the 
DDX3X and SIK1 genes (Figures 7C and 7D). Finally, we observe 
similar trends in the NET-seq data from HEK293T cells (Fig- 
ure S5B). This analysis suggests that exon borders impose a 
structured barrier to Pol II elongation. 

Most human exons can be alternatively spliced, with retained 
exons varying between cell types (Pan et al., 2008; Wang et al., 
2008). We expanded our analysis to alternatively spliced exons 
and investigated whether transcriptional pausing varies at 
exons with different splicing outcomes. We focused our anal- 
ysis on genes with a NET-seq RPKM of >1 in the gene body 
(Figure S5A) and defined skipped exons as those undetected 
in the cytoplasmic RNA-seq data (Figure 7A). As for constitutive 
exons, retained alternative exons have higher Pol II density 
compared to the density across introns (Figure 7F). These 
exons also have a similar pausing pattern as constitutive exons, 
which is visible by meta-exon analysis and at the single exon 
(Figures 7B and 7D). Interestingly, Pol II density is lower at 
skipped exons than at alternative retained exons (Figure 7F). 
Strikingly, the Pol II density pattern is similarly shaped across 
skipped and retained exons, albeit significantly different in 
amplitude (Figures 7B and 7E). The residual pausing pattern 
at skipped exons could be due to the small number of retained 
exons that are misannotated as skipped. Finally, the same 
differences in the Pol II density patterns across retained and 
skipped exons are observed in HEK293T cells (Figure S5B). 
Together, these data show that Pol II recognizes exon struc- 
tures with different processing fates, suggesting that alterna- 
tive splicing is kinetically coupled to transcription elongation 
genome wide. 

DISCUSSION 

Here, we demonstrate that human NET-seq provides complete, 
strand-specific maps of transcription at single-nucleotide reso- 
lution. NET-seq thereby defines transcriptional pausing sites 
and directly measures unstable transcripts. Finally, NET-seq 
instantaneously reports the transcription status of genes, in 
contrast to RNA-seq, which reports the balance between RNA 
synthesis and degradation. 

Our work describes an unappreciated aspect of promoter- 
proximal transcription: the presence of convergent transcription 



at many human genes. Importantly, we show that convergent 
transcription is characteristic of lower-expressed genes, sug- 
gesting a potential role in the regulation of promoter-proximal 
pausing. Prominent DNase I hypersensitivity sites flanking the 
convergent antisense peak indicate that promoter-proximal 
convergent transcription reflects initiation at a defined promoter 
located a characteristic distance from the canonical sense 
promoter. 

Other than expression level, only one commonality is found 
between the genes with convergent transcription: the dinucleo- 
tide CC occurs slightly more frequently in regions displaying 
convergent transcription (12.4% ± 0.4% for convergent, 1 1 .1 % 
± 0.2% for not convergent). Thus, it appears that convergent 
transcription is a prevalent phenomenon that is not restricted 
to a specific class of genes. An intriguing possibility is that 
paused antisense Pol II directly blocks or clashes with sense 
transcription, as can occur in yeast (Prescott and Proudfoot, 
2002). The sense and convergent antisense peaks are too far 
apart (~1 50 bp) to reflect direct contact of paused polymerases, 
but the DNase-seq data reveal that this distance likely repre- 
sents the +1 nucleosome that is positioned between them. Inter- 
ference could arise through positioning of the +1 nucleosome or 
indirect mechanisms such as transcription-induced changes in 
DNA topology, chromatin modifications, or transcription factor 
occupancy. In any event, NET-seq data do not resolve whether 
sense and antisense transcription occur simultaneously, as the 
approach requires averaging over a population of cells. There- 
fore, potential roles of convergent transcription during initiation, 
elongation, and termination will have to be investigated within 
cell populations and at the single-cell level. 

Our study yields a global picture of how transcription elonga- 
tion is altered at alternatively spliced exons in human cells. 
Changes in transcription elongation influence alternative 
splicing, which is thought to be mediated either by the differential 
recruitment of splicing factors (recruitment model) or by biasing 
kinetic competition between multiple splicing outcomes (kinetic 
model) (Bentley, 2014; Dujardin et al., 2013; Kornblihtt et al., 
2004). Here, we show that alternative splicing outcomes in hu- 
man cells are associated with Pol II exon density and strong 
pauses at the 5' and 3' ends, consistent with the kinetic model. 
What causes pauses at exons is an important question. Nucleo- 
somes can influence transcriptional pausing (Churchman and 
Weissman, 2011; Hodges et al., 2009; Izban and Luse, 1991; 
Skene et al., 2014), and, importantly, nucleosome occupancy 
and histone modifications transition at exon boundaries accord- 
ing to splice site strength (Andersson et al., 2009; Chodavarapu 
et al., 2010; Huff et al., 2010; Schwartz et al., 2009; Spies et al., 
2009; Tiigner et al., 2009). DNA sequence and DNA methylation 
at exon boundaries could contribute to pausing because 
sequence elements have been shown to cause transcriptional 
pausing (Gelfman et al., 2013; Herbert et al., 2006; Kassavetis 
and Chamberlin, 1981; Larson et al., 2014; Maizels, 1973; Vve- 
denskaya et al., 2014). Additionally, transcription factors could 
underlie pausing at retained exons, as is the case with CTCF 
binding at exon 5 of the CD45 gene (Shukla et al., 2011). The 
broad peak of Pol II density 17 bp from the 3' end of the exon 
may reflect Pol II backtracking during the recovery from the 
strong pause at the 3' end of the exon. Backtracking would 
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produce small cleavage products, consistent with the population 
of tiny RNAs that were previously identified in this region (Taft 
et al., 2009). 

We expect adaptation of human NET-seq to any human cell 
type to be straightforward, resulting in a tool to illuminate a 
variety of biological processes. Future applications include 
high-resolution analyses of transcription regulation across 
cell types, responses to signaling pathways, and cellular 
differentiation. 

EXPERIMENTAL PROCEDURES 
Cell Fractionation and RNA Purification 

Cell fractionation is performed as described by (Bhatt et ai., 2012; Pandya- 
Jones and Biack, 2009) and based on (Wuarin and Schibier, 1994) with mod- 
ifications. Aii steps are conducted on ice or at 4°C and in the presence of 
25 i^M a-amanitin, 50 units/mi SUPERaseiN and Protease inhibitors cOm- 
piete. HeLa S3 ceiis and HEK293T ceiis are grown in DMEM containing 
10% FBS, 100 U/mi peniciiiin, and 100 |xg/mi streptomycin to a confluency 
of 90%. Foiiowing iysis of 1 x 10^ ceiis, the nuciei are washed with the nuciei 
wash buffer (0.1% Triton X-100, 1 mM EDTA, in 1 x PBS) to remove cyto- 
piasmic remnants. Nuciei iysis is performed without MgCi 2 (1% NP-40, 
20 mM HEPES [pH 7.5], 300 mM NaCi, 1 M Urea, 0.2 mM EDTA, 1 mM 
DTT). The success of ceii fractionation is monitored by western biot anaiysis 
and subceiiuiar RNA-seq. 

Sequencing Library Constructions 

For NET-seq, the iibrary preparation is performed as described by Churchman 
and Weissman (201 1 , 2012) with modifications. For 3' RNA iigation, a pre-ad- 
enyiated DNA iinker with a mixed random hexameric barcode sequence at its 
5' end is used. cDNA containing the 3' end sequences of a subset of mature 
and heaviiy sequenced snRNAs, snoRNAs, rRNAs, and mitochondriai tRNAs 
are specificaiiy depieted using biotinyiated DNA oiigos (Table SI), as 
described by Ingolia et al. (2012). For subceiiuiar RNA-seq, the sequencing li- 
braries are prepared as described in Churchman and Weissman (2012), with 
the ribosomal RNA removed using the Ribo-Zero Magnetic Kit (Epicentre). 
DNA libraries are sequenced by the NextSeq 500 and HiSeq 2000 lllumina 
platforms. 

Processing and Alignment of Sequencing Reads 

Reads are trimmed and aligned using STAR (v2.4.0) (Dobin et al., 2013). 
For NET-seq data, only the position matching the 5' end of the sequencing 
read (after removal of the barcode), corresponding to the 3' end of the 
nascent RNA fragment, is recorded with a Python script using HTSeq 
package (Anders et al., 2015). Reverse transcription mispriming events 
are identified and removed when molecular barcode sequences match 
exactly to the genomic sequence adjacent to the aligned read. Reads 
that align to the same genomic position and contain identical barcodes 
are considered PGR duplication events and are removed. Splicing inter- 
mediates have 3' hydroxyls and will enter NET-seq libraries and contribute 
to the reads aligning to the exact single-nucleotide 3' ends of introns and 
3' ends of exons (Figure S1G). Therefore, reads that map precisely at 
the exact single-nucleotide ends of introns and exons are discarded, 
and the single 1 bp genomic positions are not considered in subsequent 
analysis. 

Annotation of Exons and Introns 

Clear exonic regions are identified by determining the minimum overlapping 
exonic region of all isoforms that have an exon at that position. If the region 
is present in all isoforms, it is considered a constitutive exon; otherwise, it is 
labeled alternative. Alternative skipped exons are classified by those alterna- 
tive exons that are entirely undetected in the cytoplasm RNA-seq data, and 
the rest of the alternative exons are classified as retained. Constitutive intronic 
regions are identified as the minimum intronic overlapping regions present in 
all isoforms. 



NET-Seq Exon Metagene and Heatmap Analysis 

The set of exons included in the analysis are required to be within genes of 
an RPKM >1 in gene bodies (defined in Figure 2A) and not overlapping any 
other annotated exon. They are required to begin and end at the same po- 
sition in all isoforms that contain the exon. First and last exons of genes 
are removed from analysis. NET-seq signal across each exon ±25 bp is 
normalized to range between 0 and 1 so that each exon contributes to 
the analysis with the same weight. Precise single-nucleotide genomic 
loci where splicing intermediates map (exact 3' ends of introns and exons) 
are not included in the analysis, and those locations are left blank in any 
plots. 

Analysis of Promoter-Proximal Regions 

Promoter-proximal regions were carefully selected for analysis to ensure 
that there is minimal contamination from transcription arising from other 
transcription units. Starting with genes that are Pol II protein coding, non- 
overlapping within a region of 2.5 kb upstream of the TSS and 2.5 kb 
downstream of the polyA site, and longer than 2 kb, NET-seq data at pro- 
moter-proximal regions are required to have a coefficient of variation >0.5 
and have at least 40 positions covered in the sense strand. Within a 4 kb 
window surrounding the TSS, peaks were identified in the sense from these 
genes. If >40 bases on the antisense strand have NET-seq signal, peaks 
were also identified on the antisense strand. Promoter regions with an anti- 
sense peak located downstream of the sense major peak are classified as 
displaying convergent transcription. Promoter regions with an antisense 
peak located upstream of the sense major peak are classified as displaying 
divergent transcription. 
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All NET-seq and RNA-seq data sets are available at GEO under the accession 
number GSE61332. DNase-seq data sets are available at ENCODE under the 
ENCODE DCC accession number ENCBS229UDI. 
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SUMMARY 

Pioneer transcription factors (TFs) access silent chro- 
matin and initiate cell-fate changes, using diverse 
types of DNA binding domains (DBDs). FoxA, the 
paradigm pioneer TF, has a winged helix DBD that re- 
sembles linker histone and thereby binds its target 
sites on nucleosomes and in compacted chromatin. 
Herein, we compare the nucleosome and chromatin 
targeting activities of Oct4 (POU DBD), Sox2 (HMG 
box DBD), Klf4 (zinc finger DBD), and c-Myc (bHLH 
DBD), which together reprogram somatic cells to plu- 
ripotency. Purified Oct4, Sox2, and Klf4 proteins can 
bind nucleosomes in vitro, and in vivo they preferen- 
tially target silent sites enriched for nucleosomes. 
Pioneer activity relates simply to the ability of a given 
DBD to target partial motifs displayed on the nucleo- 
some surface. Such partial motif recognition can 
occur by coordinate binding between factors. Our 
findings provide insight into how pioneer factors 
can target naive chromatin sites. 

INTRODUCTION 

Silent chromatin is packed with nucleosomes, acting as a barrier 
to targeting by most transcription factors (TFs) (Adams and 
Workman, 1995; Mirny, 2010). However, a select group of tran- 
scription factors (TFs) known as pioneer factors have the com- 
bined ability to access their target sites in silent chromatin and 
initiate cell-fate changes (Iwafuchi-Doi and Zaret, 2014; Zaret 
and Carroll, 2011). The winged-helix DNA binding domain 
(DBD) of the pioneer factor FoxA (Clark et al., 1993), which is 
similar to that of linker histone (Ramakrishnan et al., 1993), allows 
the protein to bind its DNA motif exposed on a nucleosome and 
access to silent chromatin (Cirillo and Zaret, 1999; Cirillo et al., 
1998, 2002). Such activity is necessary for liver induction (Lee 
et al., 2005). Other TFs involved in cell reprogramming can target 
their sites in silent chromatin (Montserrat et al., 201 3; Soufi et al., 



201 2; Takahashi and Yamanaka, 2006; Wapinski et al., 201 3), but 
they possess DBDs that differ from that of FoxA. Whether such 
reprogramming factors directly bind nucleosomes and how the 
structures of their respective DBDs relate to nucleosome binding, 
and hence pioneer activity, has not been assessed. 

Transcription factors containing major structural classes of 
DBDs, including Pit-Oct-Unc (POU), Sry-related High Mobility 
Group (HMG), Zinc Fingers (ZF), and basic-helix-loop-helix 
(bHLH), represented by O, S, K, and M, respectively, have 
been used in the most dramatic example of cellular reprogram- 
ming: the conversion of differentiated cells into induced pluripo- 
tent stem cells (Takahashi and Yamanaka, 2006). We previously 
compared genomic chromatin features of human fibroblasts, 
prior to the ectopic expression of OSKM, to where the factors 
first bind the genome during their initial expression (Soufi et al., 
2012). This allowed us to assess how OSKM target pre-existing 
states in chromatin, as opposed to assessing chromatin states 
after the factors are bound. The data showed that Oct4, Sox2, 
and Klf4, but not c-Myc, could function as pioneers during re- 
programming by virtue of their ability to mostly target “closed” 
chromatin sites that are DNase I resistant and “naive” by virtue 
of lacking evident active histone modifications (Soufi et al., 
2012). Recently, single-molecule imaging analysis using fluores- 
cently tagged proteins monitored in living cells proposed that 
Sox2 guides Oct4 to its target sites (Chen et al., 2014); the chro- 
matin status of the sites was unknown. However, we previously 
found that the ectopic Oct4 and Sox2 bind most extensively to 
separate sites in chromatin (Soufi et al., 2012), leaving open 
how the bulk of chromatin targeting is achieved. While many of 
initial binding events were promiscuous and not retained in 
pluripotent cells, many others occurred at target genes that are 
required for conversion to pluripotency. 

Ascii, Pax7, and Pu.1 have emerged as pioneer transcription 
factors based on targeting closed chromatin and their ability to 
reprogram cells, though assessments of direct interaction with 
nucleosomes has been lacking (Barozzi et al., 2014; Budry et al., 
2012; Wapinski et al., 201 3). In light of the bHLH factor c-Myc be- 
ing unable to bind closed chromatin on its own (Soufi etal., 2012), 
it was surprising that Ascii , another bHLH factor, can bind closed 
chromatin during reprogramming fibroblasts to neuron-like cells 
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(Wapinski et al., 201 3). Studies that have examined the correlation 
between co-existing TF binding and nucleosome occupancy, 
without characterizing the “pre-bound” chromatin state, could 
not address questions about initial chromatin access. 

Generating induced pluripotent stem (IPS) cells, using the 
OSKM factors, has proved to be highly valuable for research, 
with great potential for regenerative medicine (Robinton and Da- 
ley, 2012). In an attempt to increase the efficiency of reprogram- 
ming, efforts have focused on explaining how somatic cells 
respond to the ectopic expression of OSKM (Buganim et al., 
2013; Papp and Plath, 2013; Soufi, 2014). To gain insights into 
the molecular mechanisms that impart OSKM access to closed 
chromatin, we measured the fundamental interaction between 
the factors and nucleosomes, in vivo and in vitro, by three mutu- 
ally supportive approaches: biochemical assays, genomics, and 
structural analysis. We find that the inherent ability of DBDs to 
recognize one face of DNA on nucleosome, as seen by targeting 
a part of their canonical motif on nucleosome-enriched se- 
quences in chromatin, is the primary determinant of pioneer fac- 
tor activity. These findings can explain the pioneer activity of a 
diverse set of reprogramming factors containing different struc- 
tural classes of DBDs as well as the synergistic behavior of 
pioneer and non-pioneer factors. 

RESULTS 

O, S, K, and M Show a Range of Nucleosome Binding 
In Vitro 

The interaction of full-length O, S, K, and M, as used in reprog- 
ramming, with nucleosomes is not known. Therefore, we purified 
and refolded the full-length O, S, and K factors, along with c-Myc 
and its obligate heterodimerization partner Max from bacterial 
cells, representing post-translationally unmodified proteins (Fig- 
ure 1 A; Figure SI A). We also obtained the full-length O, S, K, and 
M expressed in human HEK293 cells and purified under native 
conditions, representing post-translationally modified versions 
of the proteins (Figure 1A). To quantify the DNA binding activities 
of the proteins, the apparent equilibrium dissociation constants 
(Kd) were determined using two different methods: from the 
decrement in the amount of free DNA (total Kd) and from the 
appearance of the first DNA-bound complex (specific Kp), in 
electrophoretic mobility shift assays (EMSA). As expected, the 
bacterial (bact.) and the mammalian (mamm.) expressed, re- 
combinant O, S, K, and M proteins bound to DNA probes con- 
taining canonical motifs, as previously reported for the purified 
DBDs (Farina et al., 2004; Nakatake et al., 2006; Rodda et al., 
2005) (Figure SIB; Table 1), and bound with much lower affinity 
to non-specific DNA sequences of the same length (Figure SI C). 
The bact. reconstituted Myc:Max heterodimers formed a com- 
plex that migrated more slowly than Max homodimers, and no 
protein-DNA complexes with similar mobility to Max homo- 
dimers were observed even at the highest concentrations, con- 
firming that the c-Myc:Max preparation did not contain Max 
homodimers (Figure SIB). The mamm. c-Myc did not show 
any specific DNA binding activity in the absence of its partner 
Max, as seen previously (Wechsler et al., 1994). These data 
demonstrate that the recombinant full-length OSKM proteins 
were highly active in specific DNA binding. 



To measure the direct interactions between OSKM and nu- 
cleosomes, we identified a nucleosome-enriched site in the 
fibroblast genome that is efficiently targeted by OSKM (Soufi 
et al., 2012), focusing on the LIN28B locus that is important 
for reprogramming and pluripotency (Shyh-Chang et al., 
2013; Yu et al., 2007). RNA sequencing (RNA-seq) data showed 
that LIN28B is silent in human fibroblasts and remains silent af- 
ter 48 hr OSKM induction, revealing that OSKM binding pre- 
cedes LIN28B gene activation (data not shown). We selected 
a region downstream of the LIN28B poly(A) site that is strongly 
enriched for a nucleosome in pre-induced human fibroblasts, 
as measured by MNase sequencing (MNase-seq) (Kelly et al., 
2012) and was targeted by all four factors at 48 hr post-induc- 
tion (Figure IB). We used PCR on human fibroblast DNA 
to generate a 162-bp, Cy5-labeled LIN28B-DMA, which was 
assembled into nucleosomes (LIN28B-nuc) by salt gradient 
dilution with purified recombinant human histones (Figure SID). 
The nucleosomes exhibited protection from low concentra- 
tions of DNase I except at the ends of the LIN28B fragment, 
compared to free DNA, indicating translational positioning 
around the center of the 162-bp LIN28B sequence (Figure 1C, 
top two boxes), similar to the observed position of the center 
of the MNase-seq peak (Figures IB and 1C). Ten-fold higher 
concentrations of DNase generated an approximately 10-bp 
DNase-cleavage repeat pattern on LIN28B-nuc, reflecting rota- 
tional positioning of nucleosomes within the population (Fig- 
ure 1C, bottom). 

It is generally accepted that nucleosomes act as a barrier to 
DNA binding by TFs (see Introduction), though exceptions 
have been noted (Perlmann and Wrange, 1988). Interestingly, 
Oct4, Sox2, and Klf4, but not c-Myc:Max, showed binding to 
the LIN28B-nuc (Figure ID). Remarkably, both mamm. and 
bact. Oct4 and Sox2 showed similar or lower apparent Kd values 
for LIN28B-nuc compared to LIN28B-DHA, indicating similar or 
higher affinity to nucleosome than to free DNA (Figure 1 D; Table 
1). On the other hand, Klf4 was able to bind LIN28B-nuc with a 
higher apparent Kd value compared to free DNA, indicating sub- 
stantial nucleosome binding, but at a lower affinity than to free 
DNA (Figure ID; Table 1). c-Myc:Max did not yield saturated 
binding to LIN28B-nuc, even at the highest concentrations of 
protein used, and thus the apparent Kd must be in the |iM range 
(Figure ID; Table 1). In conclusion, both mammalian and bacte- 
rial expressed O, S, K, and M exhibit the same relative range of 
affinities to LIN28B-nuc, and O, S, and K have an independent 
nucleosome binding activity. 

Specific and Non-Specific DNA Interactions Contribute 
to Nucleosome Binding 

It is well recognized that TFs show both sequence-specific and 
non-specific interactions with their DNA targets (Biggin, 2011). 
To measure the contribution of specificity on OSK binding to 
LIN28B nucleosomes, we carried out EMSA in the presence of 
increasing amounts of specific and non-specific DNA sequences 
that we had already characterized as competitors (Figures SI B 
and SIC; Table 1). EMSA competition experiments show that a 
40-fold molar excess of non-labeled DNA probes containing 
specific binding sites, but not probes containing non-specific se- 
quences, can displace LIN28B-DUA complexes with each of the 
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Figure 1. O, S, K, and M Display Differential Affinity to Nucleosomes In Vitro 

(A) Recombinant purified mammalian and bacterial O, S, K, M, and bacterial Max (X) proteins analyzed by SDS-PAGE and Coomassie staining. The respective 
OSKMX bands run at the expected sizes when compared to the sizes of protein standards. The OSKM DNA binding activity and specificity are shown in Figures 
S1A-S1C. 

(B) O, S, K, and M ChIP-seq profiles (blue, red, orange, and green, respectively) 48 hr post-induction and MNase-seq profile (black) in fibroblasts across the 
LIN28B locus within the displayed genomic location. 

(C) DNase I footprinting showing the protection of /./A/28B-DNA before and after nucleosome reconstitution in vitro. Electropherograms of 5'-6FAM end-labeled 
LIN28B (top strand) oligonucleotides generated by digesting free DNA (blue) and nucleosomal DNA (red) with DNase I. The amount of DNase I used is indicated on 
top of each panel. Shaded boxes represent the DNase-l-protected regions within LIN28B-nuc in the expected -^1 0-bp pattern. See Figure SI D for details about 
nucleosome reconstitution. 

(D) Representative EMSA showing the affinity of increasing amounts of recombinant O, S, K, and M proteins (bact. top panels and mamm. bottom panels) to Cy5- 
labeled LIN28B-DNA (left panels) and /_/A/28S-nucleosome (right panels). EMSA of O, S, K, and M to DNA probes containing specific and non-specific targets are 
shown in Figures S1B and S1C. 



OSKM proteins, indicating specific interaction with LIN28B-DHA 
(Figure 2A, left panel), similar to OSKM interaction with their 
canonical sites (Figure S2A). As expected, bact. and mamm. 
O, S, or K in complexes with LIN28B-nuc were displaced in the 
presence of a 40 x molar excess of unlabeled, specific compet- 



itors (Figure 2A, lanes 16,19, and 22). A 40 x or lower (range from 
5x to 20 X) molar excess of non-specific DNA failed to displace 
bact. and mamm. Oct4 from the LIN28B-nuc (Figures 2A, lane 
17, and S2B, lanes 14-16), demonstrating specific binding by 
Oct4 to the nucleosomes in vitro. 
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Table 1. Recombinant O, S, K, and M Show a Range of Affinities to Nucleosomes 


Apparent Kd (nM) 


Oct4 




Sox2 




Klf4 




c-Myc 




bact. 


mamm. 


bact. 


mamm. 


bact. 


mamm. 


bact. 


mamm. 


Total canonical 


0.61 


0.64 


0.37 


0.98 


2.49 


1.46 


1.88 


ND 


Specific canonical 


0.76 


1.04 


0.45 


1.50 


3.18 


1.95 


0.77 


ND 


Total LIN28B DNA 


0.75 


0.93 


0.38 


1.46 


1.25 


0.41 


8.28 


ND 


Specific LIN28B DNA 


0.92 


2.05 


0.68 


3.83 


2.26 


1.12 


6.25 


ND 


Total LIN28B nuc. 


1.09 


1.34 


0.34 


1.06 


5.96 


3.45 


ND 


ND 


Specific LIN28B nuc. 


1.17 


1.84 


0.39 


1.43 


7.21 


13.97 


ND 


ND 



Apparent dissociation constants (Kd) were derived from EMSA to represent the relative affinities of bacterial (bact.) and mammalian (mamm.) O, S, K, 
and M to their canonical sites, LIN28B-free DNA, and LIN28B nucleosomes (nuc.). Apparent Kd were derived from two separate binding curves 
representing two experimental replicates, fitted to the experimental data within values of ~0.97, and expressed in nM units. Apparent Kd were 
quantified from the fractional decrement of free DNA or nuc, designated as “total” binding, or from the first bound-DNA/nuc complexes, representing 
“specific” binding. 

ND, not determined. 



By contrast, a 40x excess of non-specific DNA competed 
almost all of Sox2 and Klf4 from binding to LIN28B-nuc (Fig- 
ure 2A, lanes 20 and 23). Importantly, lower levels of non-specific 
competitor, from 5 x to 20 x , did not compete to the same extent 
as specific competitor with LIN28B-nuc for binding either Sox2 
or Klf4 (Figures S2C and S2D, compare lanes 1 0 to 11 -1 3 versus 
14-16). Thus, both specific and non-specific interactions 
contribute to Sox2 and Klf4 binding to nucleosomes in vitro. 

DNase footprinting showed that each of the O, S, K, and M fac- 
tors protect sequences on LIN28B-iree DNA that resemble their 
canonical motifs (Figures 2B and 2C, dash boxes). In addition, at 
the concentrations used for footprinting, Sox2, Klf4, and c-Myc 
also show non-specific protection of the LIN28B-iree DNA (Fig- 
ure 2B, peaks labeled by asterisks). DNase footprinting of 
LIN28B-nuc bound to Oct4 and Sox2 show that the factors pro- 
tect part of their canonical motifs, agreeing with the specific 
binding to nucleosomes seen with EMSA competition experi- 
ments (Figures 2B and 2C). However, Sox2 and Klf4 protect 
both specific and non-specific nucleotides on LIN28B-nuc, sup- 
porting the non-specific contribution of Sox2 and Klf4 to nucle- 
osomes as seen in EMSA competition experiments (Figure 2B). 
The Klf4 binding site is close to the predicted nucleosome 
dyad axis, where DNase cleavage is minimal, thus precluding 
an accurate assessment of specific footprinting. Expectedly, c- 
Myc showed minimal protection of LIN28B-nuc, confirming the 
weak affinity to nucleosomes. Altogether, the O, S, and K reprog- 
ramming factors employ specific and nonspecific nucleosome 
interactions to different extents. 

Range of Nucleosome Binding In Vitro Is Observed in 
Genome Targeting In Vivo 

We assessed whether OSKM, 48 hr post-induction, targeted 
sites with pre-existing nucleosome enrichment in fibroblast 
chromatin. Pooling seven replicates from the MNase-seq data 
set (GSM54331 1 ) allowed a high-resolution map of nucleosomes 
with 6.6-fold genome coverage. First, we curated the sites where 
O, S, K, or M targeted alone, by identifying O, S, K, or M peaks 
that are 500 bp or more apart from each other. The sites were ar- 
ranged in rank order by the number of chromatin immunoprecip- 
itation sequencing (ChIP-seq) tags in the central 200 bp, from 



high- to low-affinity sites. This analysis confirms that each of 
the O, S, K, and M factors is highly enriched at the central 
200 bp within a 2-kb region (Figure 3A, blue boxes). Interestingly, 
Sox2 bound most frequently alone (n = 41,107) compared to 
Oct4 (n = 22,495), Klf4 (n = 28,21 2), and c-Myc (n = 23,885). Sub- 
sequently, MNase tags across the respective 2-kb regions were 
counted, reflecting local nucleosome enrichment. Read-density 
heatmaps showed a range of nucleosome enrichment at the 
central 200-bp regions that were targeted by O, S, K, or M fac- 
tors alone (Figure 3A, red boxes). Notably, Oct4 targets were 
the most highly enriched for nucleosomes, followed by Sox2, 
and then Klf4 throughout the respective TF rank-ordered binding 
profiles. By contrast, MNase tags in the c-Myc targeted sites 
were diminished. Also, we did not observe pre-phased arrays 
of nucleosomes at OSKM target sites, indicating that the initial 
association with nucleosomes proceeds repositioning, if any. 
Remarkably, the extent of nucleosome targeting of O, S, K, 
and M in vivo correlates with the relative abilities of the factors 
to bind nucleosomes in vitro (Figure ID; Table 1). 

To assess the contribution of non-specific binding in vivo, we 
counted the number of O, S, K, and M peaks at 48 hr post-induc- 
tion as function of false discovery rate (FDR) threshold. Remark- 
ably, while O, K, and M peak numbers begin to stabilize above an 
FDR of 0.5% (used in our study) (slopes of 1 .6, 1 .5, and 1 .3 
respectively), the number of Sox2 peaks continues to increase 
(slope of 2.1) with higher FDR (Figure S3A). Thus, it appears 
that Sox2 employs a measure of non-specific targeting in vivo, 
as we observed in vitro. 

O, S, K, and/or M Synergistic Targeting of Nucieosomes 
In Vivo and In Vitro 

It has been previously suggested that transcription factors can 
access nucleosomal DNA by cooperative binding in order to 
compete with histones (Polach and Widom, 1 996). To investigate 
the contribution of synergy between O, S, K, and/or M to nucle- 
osome targeting, we studied sites that were co-targeted by mul- 
tiple factors within a range of 1 00 bp or less from each other, i.e., 
within one nucleosome. In general, we observed that all possible 
O, S, K, and/or M combinations targets were enriched for nucle- 
osomes except for KM targets, and the co-bound sites, on 
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average, were more enriched for nucleosomes than singly bound 
sites (Figures 3B and S3B). Notably, there were more S, K, and/ 
or M combinations that included Oct4 and showed higher nucle- 
osome enrichment at initially targeted sites, compared to binding 
combinations lacking Oct4 (Figures 3B and S3, compare C-l to 
J-M). For example, c-Myc showed the most nucleosome target- 
ing when co-bound with Oct4, followed by with Sox2, while c- 
Myc showed weak targeting to nucleosomes with Klf4 (Figure S3, 
compare E to K and M). Interestingly, the KM combination was 
the most frequent at nucleosome-depleted promoters, similar 
to KM targeting DNase hypersensitive regions (Soufi et al., 
2012) (Figure S3M, red plot). Nevertheless, KM still targeted 
nucleosome-enriched sites at TSS-distal regions (Figure S3M, 
blue plot). 

To further investigate synergistic targeting with Oct4, we as- 
sessed binding by each of the bact. Sox2, Klf4, and c-Myc:Max 
(1 nM) to the reconstituted LIN28B-nuc (2 nM) in the presence 
of low amounts of Oct4 (0.3 nM). EMSA showed that all 
the three recombinant proteins are able to bind with Oct4 to 
nucleosomal DNA in vitro, forming higher order complexes 
(Figure 3C). Notably, c-Myc:Max binding to LIN28B-nuc was 
enabled in the presence of Oct4 (Figure 3C, right panel). To 
assess the presence of histones in the LIN28B-nuc in the com- 
plexes, we transferred the proteins from an EMSA gel to a poly- 
vinylidene fluoride (PVDF) membrane and blotted for H3 and 
H2B (Figure S4). Though the c-Myc antibody was the weakest, 
all L/A/28B-nuc-bound complexes showed detectable amounts 
of H3, and to a lesser extent H2B, indicating the factors bind 
together to nucleosomes. In summary, Oct4, Sox2, and Klf4 
enable c-Myc to target nucleosomal sites both in vivo and 
in vitro. 

O, S, and K Separately Recognize Partial Motifs on 
Nucleosomes 

To identify DNA motifs that are associated with O, S, and K 
alone targeting to nucleosomes in vivo, the respective targeted 
sites were rank ordered according to nucleosome enrichment 
in the central 200 bp. This allowed us to separate nucleosome- 
enriched from nucleosome-depleted regions that were individu- 
ally targeted by O, S, or K. By these criteria, 85%, 80%, and 
65% of the genomic sites initially targeted by Oct4, Sox2, and 
Klf4, respectively, were enriched for nucleosomes (Figures 4A- 
4C, red boxes). We used de novo motif analysis, separately 
analyzing the targets that were enriched for nucleosomes 
(Figures 4A-4C, red boxes, upper portion) from those that 
were depleted of nucleosomes, i.e., free DNA targets (Figures 
4A-4C, red boxes, lower portion). While O, S, and K primarily 
targeted sequences similar to their canonical motifs at nucleo- 
some-depleted and nucleosome-enriched sites, motifs occur- 
ring at nucleosome-enriched sites showed distinctive features 
(Figures 4D-4F). 

Strikingly, while Oct4 targeted its canonical octamer sequence 
at nucleosome-depleted sites (~49% of n = 3,375), Oct4 
targeted hexameric motifs resembling one or another half of 
the octamer motif at nucleosome-enriched sites (42% and 
28%, respectively, of n = 19,120) (Figure 4D). Sox2 targeted its 
canonical HMG box motif at nucleosome-depleted sites (64% 
of n = 8,221), while targeting a more degenerate motif lacking 



the sixth “G” nucleotide in the nucleosomal motif (~74% out 
of n = 32,886) (Figure 4E, arrowhead). Finally, Klf4 alone targeted 
its nonameric motif at nucleosome-depleted sites (94% of n = 
9,874), whereas Klf4 targeted a hexameric motif that was 
missing the three terminal nucleotides at nucleosome-enriched 
sites (90% of n = 18,338) (Figure 4F, see dashed lines). 

These findings agree with the above DNase footprinting of 
LIN28B-nuc bound to the factors (Figure 2B, right panels), with 
Oct4 and Sox2 protecting a part of their canonical motifs on 
one side of the LIN28B-nuc DNA (Figures 2B and 2C; right). On 
free DNA, Klf4 protected the first three nucleotides of its motif 
on the upper strand while protecting the remaining six nucleo- 
tides of its motif on the bottom strand (Figure S5A). However, 
Klf4 did not protect the first three nucleotides on the upper 
strand of LIN28B-nuc, as they were not exposed to DNase I 
digestion, indicating that Klf4 may be interacting with part of its 
motif exposed on the other strand (Figures 2B and 2C). 

These data show that the O, S, and K factors can indepen- 
dently target nucleosomes using partial or degenerate motifs, 
and that each of the factors targets their full canonical motif in 
the absence of nucleosomes at a target site. Targeting of partial 
motifs at nucleosomal sites by OS or OK together also reveals 
partial motifs for each of the factors (data not shown). 

The Molecular Basis for O, S, and K Nucleosomal 
Targeting 

In order to define the molecular basis that govern O, S, and K in- 
teractions with nucleosomal DNA, we interrogated the three 
dimensional structures of O, S, and K DBDs in complexes with 
their canonical motifs that were deposited in the RCSB Protein 
Data Bank. Oct4 contains a bipartite POU domain, composed 
of an N-terminal POU-specific (POUs) and a C-terminal POU-ho- 
meodomain (POUhd), separated by a linker region. The X-ray 
structure of Oct4-POU-DNA complex confirms that the POUs 
and POUhd each bind one-half of the octameric motif on DNA 
(Esch et al., 2013) (Figure 4G, lower panels). The truncated 
POUs and POUhd can bind their respective half motif DNA 
probes in vitro, independently from each other (Verrijzer et al., 
1992). Interestingly, the isolated DNA-bound state of either 
POUs or POUhd accommodates less than half of the DNA sur- 
face across the circumference of the double helix (DNA surface 
occupied 606 and 718 A^, respectively), leaving the opposite 
DNA surface solvent-exposed and potentially free to interact 
with histones in a nucleosome conformation (Figure 4G, red 
dashed arrows in upper panels). However, once both POUs 
and POUhd are bound to the full motif (1 ,321 A^), less than a 
quarter of the DNA circumference is solvent-exposed and hence 
would be incompatible with nucleosome binding, due to steric 
hindrance (Figure 4G, red dashed arrow in lower panel). Thus, 
the two POU domains do not target directly adjacent half sites 
on nucleosomes, as seen in free DNA, but the exposure of the 
separate half sites on nucleosomes is enough for Oct4 initial 
targeting. 

Sox2 binds DNA through its HMG box, inducing a sharp 
bend and widening of the minor groove (Remenyi et al., 2003) 
(Figure 4H, lower-left panel). Our motif analysis showed that 
Sox2 targets a degenerate motif within nucleosomes, missing 
one “G” nucleotide at the sixth position (Figure 4E). This “G” 
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Figure 2. The Contribution of Non-Specific Binding to Nucleosome Targeting In Vitro 

(A) Representative EMSA showing the affinity of recombinant O, S, K, and M proteins (bact. top paneis and mamm. bottom paneis) to LIN28B-DMA (ieft paneis) 
and /_/A/28B-nucieosome (right paneis) in the presence of 40-foid moiar excess of specific competitor (“s” ianes) or non-specific competitor (“n” ianes) or 
absence of competitor ianes). Competition assays showing the specificity of O, S, K, and M to their canonicai DNA probes and to LIN28B DNA and 
nucieosome under iower titration of competitor are shown in Figure S2. 
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nucleotide is positioned at the angle of the induced bend and 
makes direct contacts with the N46 residue at the N-terminal 
tail of Sox2-HMG (Remenyi et al., 2003) (Figures 4E and 4H, 
arrowhead). Remarkably, mutation of this one amino acid 
(N46Q) within Sox2-HMG results in a significant decrease in 
DNA-bending ability without affecting DNA binding (Scaffidi 
and Bianchi, 2001). In transient transfection assays, the Sox2- 
N46Q mutant displays higher transactivation activity from the 
Fgf4 enhancer compared to Sox2 wild-type (Scaffidi and Bian- 
chi, 2001). Furthermore, mutation of the “G” nucleotide in the 
sixth position of the motif has the unique ability, among all mu- 
tations tested, to abolish DNA bending by wild-type Sox2 
(Scaffidi and Bianchi, 2001). Together these data indicate that 
Sox2 would not induce extensive DNA-distortion when target- 
ing the nucleosomal motif, since that motif lacks the “G” nucle- 
otide. To further support these observations we superimposed 
the 3D structure of DNA bound by wild-type Sox2 and Sox2- 
N46Q mutant on nucleosomal DNA and after 1,000 cycles 
refinement we calculated the root-mean-square deviation 
(RMSD) as a measure of the average distance between the 
phosphate backbone for the best fit. These analysis reveal 
that the less distorted DNA is more compatible with nucleo- 
somal DNA (RMSD = 0.86 A) compared to the extensively dis- 
torted DNA (RMSD = 6.83 A) (Figure 4H, right panel). In conclu- 
sion, our data indicate that Sox2 engages nucleosomes by 
recognizing a degenerate motif that involves less DNA distor- 
tion, better filling the curvature and widened minor groove of 
DNA around the histone octamer. 

Klf4 recognizes the nonameric DNA motif using all three 
C 2 Fl 2 -type ZFs (three nucleotides per ZF) located at the C 
terminus (Schuetz et al., 2011) (Figure 4F). However, we 
identified a hexameric motif, lacking the last three nucleotides, 
enriched within nucleosomal targets (Figure 4F, 90%). Muta- 
genic studies have shown that the hexameric motif represents 
the minimal essential binding site for Klf4 (Shields and Yang, 
1998). Recently, X-ray crystallography has revealed the struc- 
tures of Klf4 bound to the hexameric and nonameric sites 
(Schuetz et al., 2011) (Figure 41). Klf4 uses its two most C-ter- 
minal ZFs, out of the three, to recognize the hexameric motif, 
occupying one side of the DNA double helix (595 A^) and leav- 
ing more than half of the opposite surface potentially free to 
interact with histones in a nucleosome (Figure 41, red dashed 
arrow in upper-right panel). Klf4 bound to the nonameric 
motif, with all three ZFs, fills up more than half of the DNA sur- 
face (847 A^) and would hinder binding to nucleosomes (Fig- 
ure 41, red dashed arrow in lower-right panel). This analysis 
suggests that Klf4 employs two of its three ZFs to engage 
nucleosomes. 

Interestingly, the observed adaptability of O, S, and K to 
recognize partial motifs correlates with the apparent flexibility 
of their respective DBDs that we modeled during their transition 
from the DNA-free to the DNA-bound states (Figures S5B-S5G). 



c-Myc Recognizes a Partial Motif Enriched on 
Nucleosomes through Co-Binding with Other Factors 

Using the partitioning method in Figures 4A-4C, a subset of c- 
Myc targeted sites (33%, n = 5,494) were enriched for nucleo- 
somal DNA, while the majority of sites (77%, n = 18,391) did 
not exhibit enrichment (Figure 5A). Motif analysis revealed that 
c-Myc nucleosomal targets were enriched for an E-box motif 
that is missing the two central nucleotides (CANNTG) compared 
to the canonical E-box (CACGTG) (Figure 5B, double arrow- 
heads in top panel). However, nucleosome-depleted targets 
were enriched for a less degenerate E-box motif that we and 
others have previously reported to be associated with c-Myc 
binding at enhancers (Lin et al., 2012; Nie et al., 2012; Soufi 
et al., 2012) (Figure 5B, single arrowhead in bottom panel). Inter- 
estingly, c-Myc-alone (i.e., without OSK) nucleosomal targets 
were additionally enriched for a homeobox (73%) motif that is 
highly similar to the POUhd motif, compared to nucleosome- 
depleted sites (48%) (Figure 5C). Likewise, the majority of c- 
Myc sites that co-targeted with Oct4 (76%, n = 2,219) that are 
enriched for nucleosomes contain centrally a degenerate E- 
box motif similar to that identified in nucleosomal c-Myc-alone 
targets (Figures 5D and 5E). The separate halves of the POU 
motif were also enriched at the OM targeted sites, indicating 
that Oct4 uses one or the other DBD while co-binding with c- 
Myc (Figure 5F). In conclusion, c-Myc targets nucleosomal sites 
either with O, S, K, or with endogenous homeodomain factors, 
recognizing a centrally degenerate E-box motif. 

The basic region of bHLH domain, not bound to DNA, appears 
to be unfolded in solution (Sauve et al., 2004) (Figure 6A; Fig- 
ure S6A). Upon DNA binding, the basic region folds as an exten- 
sion of helix-1 and will be referred to as basic-helix-1 (bH) (Nair 
and Burley, 2003) (Figures 6D and S6B, blue helices). Notably, 
the most conserved four nucleotides of the E-box (CANNTG) 
face toward the interaction interface between bHLH and DNA, 
while the degenerate central two nucleotides (CANNTG) face 
the exterior part of the DNA helix (Figure 6B, see cyan and 
magenta arrowheads). The transition between DNA free and 
DNA bound by molecular morphing indicates that the bH follows 
a gradual folding trajectory across the major groove of DNA (Fig- 
ures 6A-6D and S6B). The interaction between a partially folded 
bHLH and the CANNTG drives the initial recognition of the E-box 
without making contacts with the central nucleotides (NN), re- 
sulting in the centrally degenerate E-box motif that we observed 
for c-Myc at the nucleosome-enriched sites (Figure 6B). 

Importantly, the partially folded c-Myc only occupies one-half 
the DNA helix surface, leaving the other half solvent-exposed 
and potentially nucleosome compatible (Figure 6B, red dashed 
arrow). Apparently, the partially folded c-Myc-DNA complex re- 
quires further assistance from other factors such as Oct4 or other 
homeodomain-containing proteins to remain associated with 
DNA. The interaction between a partially folded bHLH and a 
centrally degenerate E-Box motif has been observed by X-ray 



(B) DNase I footprinting showing the protection of L/A/28S-DNA (ieft paneis) and LIN28B-nuc (right paneis) in the absence (biue iines) or presence (red iines) of O, 
S, K, and M. Eiectropherograms of 5'-6FAM end-iabeied LIN28B (top strand) oiigonucieotides generated by DNase i digestion of DNA (0.006 U) and nucieosomai 
DNA (0.06 U). Dashed boxes and stars represent specific and non-specific sites protected by O, S, K, and M, respectiveiy. 

(C) A cartoon representation of the 1 62-bp LIN28B DNA (ieft) and nucieosome (right) highiighting the binding sites of O, S, K, and M in vitro in biue, red, orange, and 
green, respectiveiy, as measured by DNase i footprinting. The protected DNA sequences are indicated. 
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Figure 3. O, S, K, and M Display a Range of Nucleosome Targeting In Vivo 

(A) Read density heatmaps (in coior scaies) showing the intensity of O, S, K, and M ChiP-seq signai (biue) and MNase-seq (red) spanning ±1 kb from the center of 
the O, S, K, and M peaks where each factor binds aione within 500-bp threshoid. The anaiyzed sequences were organized in rank order, from high to iow number 
ChIP-seq reads within the centrai 200 bp (doubie arrows). The number of targeted sites is indicated. 

(B) As in (A), but showing where the OS, OK, and OM factors peaks are within 100 bp or iess apart from each other. The fuii possibie OSKM combinations are 
shown in Figure S3. 

(0) The binding affinity of S, K, and M (1 nM) in the presence of Oct4 (0.3 nM) to LIN28B nucieosomai DNA (ianes 4, 6, and 8, respectiveiy) or absence of Oct4 (ianes 
3, 5, and 7). The binding of Oct4 on its own (iane 2) and free LIN28B nucieosomes (iane 1 ) are indicated. The histone content of the nucieosome bound compiexes 
is shown in Figure S4. 

crystallography for Mitf, which shares 86% sequence homology rigid structure, stabilizing DNA binding and resulting in less- 

across the basic region with c-Myc (Figure S6C) (Pogenberg degenerate E-box motif, which would be incompatible with 

et al., 2012). Once fully folded, the c-Myc bHLH adopts a nucieosomes (Figure 6D). We conclude that partially unfolded 
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Figure 4. O, S, and K Recognize Partial Motifs on Nucleosomes 

(A-C) Same as in Figure 3A, but the sites were organized in a descending rank order according to the MNase-seq tags within the central 200 bp. The nucleosome- 
enriched sites were separated from the nucleosome-depleted sites (dashed line) for each factor. 

(D-F) Logo representations of de novo motifs identified in the O, S, and K nucleosome-enriched targets (top) and nucleosome-depleted targets (bottom). The 
motifs were aligned to canonical motifs (middle). The number of targets analyzed and percentage of motif enrichments are indicated. 

(G-l) Cartoon representations of the 3D structures of O (PDB-3L1 P), S (PDB-1GT0), and K (PDBs-2WBS and 2WBU) DBDs in complexes with DNA containing 
canonical motifs. Side and top views are shown for O and K, and dashed curved arrows are shown to represent the extent of exposed DNA surface (G and I). The 
3D structure of the less distorted DNA (top) and extensively distorted DNA (bottom) were superimposed on nucleosomal DNA (PDB-3LZ0, gray) to display the 
extent Sox2-nucleosome binding compatibility by measuring RMSD of the fit. 
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Figure 5. c-Myc Recognition of Degenerate E-Box on Nucleosome Is Assisted by Binding with Co-Factors 

(A-F) Same as shown in Figures 4A-4F, but for c-Myc alone and OM targets. (C) The enrichment of an associated motif (HD) is measured within c-Myc alone 
targets containing or depleted from nucleosomes. The data indicate that c-Myc is driven to a degenerate E-box on nucleosomes, in part, by homeodomain 
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c-Myc targets a centrally degenerate E-box motif, thereby adapt- 
ing to a nucleosome template when assisted by other factors. 

Predicting Pioneer Activity among Different bHLH 
Factors in Reprogramming 

To gain insights on how bHLH proteins may differentially target 
nucleosomes in reprogramming, we examined the 3D structures 
of a range of bHLH-DNA complexes that have been used in re- 
programming experiments (Longo et al., 2008; Ma et al., 1994; 
El Omari et al., 2013). Interestingly, the basic helix-1 from the 
different bHLH domains extends across the DNA helix to variable 
extents (Figures 6E-6I). Motif analysis was also carried out on 
genomic sites bound by these factors from available ChIP-seq 
data. Notably, in conjunction with our findings on c-Myc, the 
length of the bH a helix negatively correlates with the degeneracy 
of the central nucleotides (CANNTG) of the de novo motifs that 
we identified for each factor (Figures 6E-6I). 

To further test this correlation, we examined the recent find- 
ings that the bHLH factor Ascii can act as a pioneer factor during 
reprogramming fibroblasts to neurons (Wapinski et al., 201 3). We 
measured nucleosome enrichment in pre-induced mouse em- 
bryonic fibroblasts (MEF) within Ascii initial targets in MEFs after 
48 hr induction (Teif et al., 201 2; Wapinski et al., 201 3). Unlike c- 
Myc, the majority of Ascii sites (73%, n = 3,019) were enriched 
for nucleosomes (Figure S6D). Importantly, the basic helix-1 of 
Ascii is considerably shorter compared to that of c-Myc, leaving 
more of the DNA surface solvent exposed (Figure 6E). Similar to 
c-Myc, Ascii target nucleosomes were enriched (99.3%) for an 



E-box motif with degenerate central two nucleotides (CANNTG) 
compared to the E-box seen in 98.7% of sites depleted from nu- 
cleosomes (Figure S6E). Ascii nucleosomal targets contain an 
extra “G” nucleotide at the 3'-end of the E-box motif, which is 
missing in the nucleosome-depleted sites, resulting in more spe- 
cific targeting of nucleosomes despite the centrally degenerate 
E-box (Figures 6E and S6E). 

Ascii and Olig2 exhibited the shortest bH regions, by molec- 
ular modeling, compared to X-ray crystals of NeuroD, MyoD, 
and Tall , with longer bHs. To verify that the observed bH lengths 
were not due to the methodology, we examined the amino-acid 
composition of the basic regions in all bHLH factors (Figure 6J). 
The bH-DNA interaction is mainly driven by positively charged 
residues (and hence the name basic). Interestingly, the Ascii 
bH ends at the last (N-terminal end) basic residue (arginine), 
which is positioned further upstream (toward the C terminus) 
compared to the other factors (Figures 6J and 6R, residues in 
blue boxes). The last basic residue of Olig2-bH falls in between 
Ascii and the rest of the factors. In conclusion, the basic helix- 
1 of pioneer bHLH factors such as Ascii is intrinsically shorter, 
allowing the factors to bind nucleosomes more efficiently. 

DISCUSSION 

The introduction of a defined set of TFs, such as OSKM, into 
differentiated cells can result in cell-fate conversion (Takahashi 
and Yamanaka, 2006), and yet it has been clear that the different 
factors have different contributions or “strengths” in cell-type 
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Figure 6. The Folding Extent of bHLH Basic Helix-1 on DNA Anti-Correlates with Targeting Centrally Degenerate E-Box Motifs on 
Nucleosomes 

(A-D) The folding trajectory of basic helix-1 of c-Myc upon DNA binding showing the possible conformations of c-Myc: Max heterodimers (B and C) that are 
compatible with nucleosome binding. See Figure S6A for c-Myc Morph. The initial DNA-free state (A) and the fully folded DNA-bound state (D), which is 
incompatible with nucleosome binding, are indicated. The associated motifs for each c-Myc:Max conformation are shown in the left. See Figure S6B for Mitf 
structure in complexes with E-box with variable central nucleotides. 

(E-l) Cartoon representations of various bHLH reprogramming factors in complexes with DNA containing their canonical motifs (right). The de novo motifs 
identified for each factor from ChIP-seq data are indicated (left). The cyan and pink arrows represent the position of the exposed nucleotides within the central E- 
box motif not making base-contacts with the relative bHLH conformation. The central two nucleotides (CANNTG) are colored in purple in the DNA cartoon. The 
color scheme of the bHLH along with leucine zipper (LZ) is shown at the bottom. 

(J) Alignment of amino-acid sequences of the basic region of Ascii, Olig2, NeuroD, MyoD, Tall, and c-Myc. The last basic residue at the C-terminal end is 
highlighted in blue. See Figures S6D and S6E for MNase enrichment and motif analysis of Asll . 



conversion. This provided the basis for our effort to tackle the 
long-standing problem of how TFs initially target their sites in 
closed chromatin. The pioneer factor theory partly answers this 
question by suggesting that a select group of TFs, such as 
FoxA, access closed chromatin by a direct interaction with nucle- 
osomal DNA through a DBD that resembles the structure of a 
linker histone (Zaret and Carroll, 201 1 , Iwafuchi-Doi and Zaret, 
201 4). We previously found that the diverse set of DBDs exhibited 
by O, S, K, and M, which are structurally different from a linker his- 
tone, have differential abilities to access closed chromatin (Soufi 



et al., 2012). Here, we revealed that the relative tendencies of O, 
S, K, and M to initially target nucleosomal sites in reprogramming 
reflect their inherent ability to bind nucleosomes in vitro and their 
ability to recognize partial motifs on nucleosomes in vivo. This is 
different from what was observed for FoxAl , which recognizes 
the same motif on free DNA and nucleosomes (Cirillo et al., 
1998; Li et al., 2011). Factors that cannot bind nucleosomes on 
their own, such as c-Myc, associate with other factors to target 
degenerate E-boxes on nucleosomes. Our new approach is in 
contrast to the previous predictions of pioneer factors by fitting 
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fully folded DBDs, in their naked DNA-bound state, on nucleo- 
somes through a docking mechanism. 

We found that the bipartite POU domain of Oct4 can target 
partial motifs exposed on nucleosomes using separate PouS 
or PouHD domains. The single motif targeted by each domain 
is longer than each half of the octamer motif, thus providing 
greater binding specificity than a half motif. In addition, mass 
spectroscopy analysis has identified histones as interacting 
partners of Oct4 in mouse ES cells (Pardo et al., 201 0), indicating 
an additional affinity contribution by protein-histone interactions. 
The bipartite domain-Pax family of TFs can bind DNA using both 
domains and still occupy half of the DNA surface and would 
therefore be compatible with nucleosome binding (Garvie 
et al., 2001; Xu et al., 1999) (Figure S7, right, compared to POU 
TFs). This agrees with the finding that Pax7 is a pioneer factor 
that uses full motif recognition during initial targeting (Budry 
et al., 2012). Thus, bipartite TFs have to either employ one 
DBD or position both DBDs on the same surface of DNA in order 
to interact with nucleosomes. Notably, the pioneer activity of a 
Zebrafish homolog of an Oct protein was observed during the 
maternal-to-zygote transition (Lee et al., 2013; Leichsenring 
et al., 2013), suggesting that targeting nucleosomal sites may 
be a general method for de novo programming of the genome. 

The high affinity of Sox2 for nucleosomes may be due to the 
pre-bent conformation of DNA, which widens the DNA minor 
groove and favors initial minor groove sensing. While bending 
naked DNA by Sox2 requires minimal work (Privalov et al., 
2009), the energy cost would impede Sox2 to further bend 
DNA on nucleosomes. We find that Sox2 would not further 
bend nucleosomal DNA because it recognizes a partial motif 
that diminishes the extreme bending of the full motif. Sox family 
members share the recognition of the core motif but display 
diverse preferences outside the core in naked DNA (Badis 
et al., 2009). Our findings reveal greater flexibility with regard 
to Sox2 core motif preferences on nucleosomes than was previ- 
ously recognized. In addition, we showed evidence for both 
specific and nonspecific binding by Sox2 in vitro and in vivo. 
The stable, motif-driven targeting by Sox2 on nucleosomes in 
the ChIP-seq data show much lower co-binding with Oct4 (Soufi 
et al., 2012) than seen in live imaging (Chen et al., 2014), leaving 
open whether the latter approach depicts nucleosomal or free 
DNA binding during genome scanning. 

Klf4 showed higher affinity to free DNA compared to nucleo- 
somes in vitro, and its initial targets in vivo were enriched for nu- 
cleosomes, though less so than compared to Oct4 and Sox2. 
Klf4 targets nucleosomes in vivo using two out of its three zinc 
fingers, recognizing a hexameric motif. This explains how the af- 
finity of Klf4 to nucleosomes is lower than that to free DNA. The 
pioneer factor GATA4 binds nucleosomes modestly in vitro (Ci- 
rillo and Zaret, 1 999) and targets a hexameric motif in vivo (Zheng 
et al., 2013). Notably, GATA4 only contains two zinc fingers. The 
Gils zinc finger family 1 (Glil) greatly enhances reprogramming 
when co-expressed with OSK (Maekawa et al., 2011). Interest- 
ingly, despite containing five ZFs, Glisi only employs two ZFs 
(number four and five) to recognize its targets (Pavletich and 
Pabo, 1993). The repressor ZFP57/Kap1, which is known to be 
associated with closed chromatin, also recognizes a hexameric 
motif despite containing an array of seven zinc fingers (Quenne- 



ville et al., 201 1). This suggests that zinc finger proteins in general 
may use two zinc fingers to initially target hexameric motifs 
exposed on nucleosomes. Klf4 also showed non-specific inter- 
actions with nucleosomes, suggesting a similar genome search- 
ing mechanism as Sox2. 

Various examples have been reported on the overexpression of 
bHLH factors in cancer, including c-Myc, Tall , and Olig2 (Lin et al., 
2012; Nie et al., 2012; Palii et al., 2011; Sanda et al., 2012; Suva 
et al., 2014). In all of these cases, the bHLH factors have been 
associated with degenerate E-box motifs and co-binding with 
other factors. We propose that the extent to which basic helix-1 
lays on DNA and co-binds with pioneer factors is reflected in the 
recognized motif, predicting bHLH ability to bind nucleosomes 
and access closed chromatin. Interestingly, the mutation of 
two amino acids within the basic helix-1 that interacts with central 
E-box makes the non-myogenic bHLH factor El 2 able to convert 
fibroblasts to muscle cells (Davis and Weintraub, 1992). The ho- 
meodomain factor PBX primes MyoD targets to induce myogenic 
potential (Maves et al., 2007). Furthermore, the hematopoietic 
TALI -E45 heterodimer employs one of the two bHLH domains us- 
ing LM02 as an adaptor to interact with GATA1 (El Omari et al., 
201 3). Hence, in addition to their intrinsic structures, bHLH factors 
co-binding with DNA-binding and non-DNA binding proteins 
appear to be involved in stabilizing the interaction of the partially 
folded bHLH factors to nucleosomes. These features are relevant 
to the multitude of bHLH factors functioning in development, can- 
cer, and reprogramming experiments. 

The differential ability of TFs to recognize their target sites on 
nucleosomes supports a hierarchical model where pioneer fac- 
tors are the first to gain access to their targets in silent chromatin. 
We also observe that the initial targeting can occur for non- 
pioneer proteins when they bind in conjunction with pioneer fac- 
tors that allow the former to recognize their DBDs to a reduced 
motif that is compatible with nucleosome binding. Further 
studies are needed to understand the secondary events that 
lead to subsequent changes in local chromatin structure and 
the formation of large complexes at gene regulatory sequences. 
By understanding the mechanistic basis by which certain tran- 
scription factors are especially capable of initiating cell-fate 
changes, we hope to modulate the process and ultimately con- 
trol cell fates at will. 

EXPERIMENTAL PROCEDURES 
Protein Expression and Purification 

We made the bacterial expression plasmids pET-28B-huOct4, pET-28B-hu- 
Sox2, pET-28B-huKlf4, and pET-28B-huMyc encoding the full-length human 
O, S, K, M, respectively, fused to an N-terminal 6x histidine tag. The recombi- 
nant proteins were expressed in E. coli Rosetta (DE3) pLysS (Novagen #70956- 
3) and purified using a nickel charged column under denaturing conditions 
The mammalian expressed human OSKM recombinant proteins were obtained 
from OriGene (Oct4 #TP311998, Sox2 #TP300757, Klf4 #TP306691, c-Myc 
#TP301611). See Extended Experimental Procedures for more details of this 
and following sections. 

Nucleosome Reconstitution 

The 162-bp LIN28B DNA fragment was created by PGR with end-labeled 
primers. The fluorescent-tagged DNA fragments were gel extracted and 
further purified using ion-exchange liquid chromatography by MonoQ (GE 
Healthcare). The nucleosomes were reconstituted by mixing purified human 
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H2A/H2B dimers and H3/H4 tetramers with LiN28B-DNA at 1 :1 moiar ratio of 
histone octamer:DNA using a sait-urea gradient. 

DNA Binding Reactions 

Cy5 end-iabeied DNA containing specific or non-specific sites, LIN28B-DMA, 
and L/N288-nucieosomes were incubated with recombinant proteins in 
10 mM Tris-HCi (pH 7.5), 1 mM MgCi 2 , 10 |iM ZnCi 2 , 1 mM DTT, 10 mM 
KCi, 0.5 mg/mi BSA, 5% giyceroi at room temperature for 60 min. Free and 
bound DNA were separated on 4% non-denaturing poiyacryiamide geis run 
in 0.5 X Tris borate EDTA and visuaiized using a Phosphorimager. The inten- 
sity of Cy5 fluorescence was quantified using Multi-Gauge software (Fujifilm 
Science lab) to generate binding curves for Kq analysis. 

DNase footprinting was carried out by treating free DNA or nucleosomes, 
6FAM 5' end-labeled, with DNase I (Worthington) in the absence or presence 
of TFs. The end-labeled digested fragments were separated by capillary elec- 
trophoresis in ABI 96-capillary 3730XL Sequencer (Applied Biosystems). 

Genomic Data Analysis 

The O, S, K, and M ChIP-seq aligned data along with the called peaks (FDR- 
controlled at 0.005) were obtained from GEO (GSE36570) (Soufi et al., 2012). 
The MNase-seq data (GSM543311) (Kelly et al., 2012) were aligned to build 
version NCBI36/HG18 of the human genome, and seven replicates were 
pooled together generating 145,546,004 unique reads. The MNase-seq reads 
were extended to 150 bp to cover one nucleosome and thus resulting in 6.6- 
fold genome coverage. 

Motif analysis was carried out using the MEME-ChIP suit v.4.9.1 available at 
http://meme.nbcr.net (Machanick and Bailey, 2011). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures and 
seven figures and can be found with this article online at http://dx.doi.org/ 
10.1016/j.cell.2015.03.017. 
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SUMMARY 

We address the mechanism by which adult intestinal 
stem cells (ISCs) become localized to the base of 
each villus during embryonic development. We find 
that, early in gut development, proliferating progeni- 
tors expressing ISC markers are evenly distributed 
throughout the epithelium, in both the chick and 
mouse. However, as the villi form, the putative stem 
cells become restricted to the base of the villi. This 
shift in the localization is driven by mechanically 
influenced reciprocal signaling between the epithe- 
lium and underlying mesenchyme. Buckling forces 
physically distort the shape of the morphogenic field, 
causing local maxima of epithelial signals, in partic- 
ular Shh, at the tip of each villus. This induces a 
suite of high-threshold response genes in the under- 
lying mesenchyme to form a signaling center called 
the “villus cluster.” Villus cluster signals, notably 
Bmp4, feed back on the overlying epithelium to ulti- 
mately restrict the stem cells to the base of each 
villus. 

INTRODUCTION 

Although studies of stem cells have revealed a great deal about 
maintenance and propagation, the origin of most adult stem cell 
populations remains an open question. Intestinal stem cells 
(ISCs) have been particularly well studied. A number of important 
factors have been described as being produced in the ISC 
niche to maintain their multipotency and proliferative potential, 
including canonical Wnt signaling (Spence et al., 2011). The iden- 
tification of genetic ISC markers in the adult intestine, such as 
Lgr5, has made it possible to identify the location of these cells. 
In the adult Lgr5-positive ISCs reside in the intestinal crypt, found 
below the base of each (Barker et al., 2007). The earliest known 
expression of Lgr5 is just after birth in mouse (Kim et al., 201 2). At 

CrossMark 



this time, Lgr5 is expressed at the base of each villus, where the 
crypt will soon form. However, the expression patterns of this 
and other adult stem cell markers in amniotic embryos have 
not been systematically studied, and indeed, whether or not 
Lgr5-positive cells are even present prior to birth has remained 
uncertain. 

It is clear, however, that morphological villi arise before birth 
(or hatching in birds). Perhaps surprisingly, although stem cell 
proliferation and differentiation are critical for homeostatic 
maintenance of the villi, the initial formation of the villi does 
not appear to be a stem-cell-dependent phenomenon, at least 
in the chick. Morphogenesis of the lumen of the chick gut 
occurs in a stepwise progression wherein the initially smooth 
lining of the primitive gut tube is first transformed by compres- 
sive forces into a series of longitudinal parallel ridges. These 
are then deformed into a series of regular zigzag ridges. 
Finally, the zigzags segment to give rise to individual villi (Cou- 
lombre and Coulombre, 1958; Shyer et al., 2013) (Figure SI A). 
A similar process occurs in the formation of human villi (Hilton, 
1902; Lacroix et al., 1984). The formation of the ridges is driven 
by the differentiation of the first circumferential smooth muscle 
layer of the intestine. This forms a barrier restricting further 
expansion as the inner submucosal and endodermal layers 
continue to proliferate, resulting in their buckling. Similarly, 
the zigzags form due to compressive forces generated 
by further submucosal and endodermal growth when the sec- 
ond longitudinal smooth muscle layer differentiates, creating 
orthogonal barriers to expansion in both the longitudinal and 
radial directions. Finally, the arms of the zigzags each give 
rise to individual villi as the third, innermost layer of longitudinal 
smooth muscle differentiates in the context of a decrease in 
proliferation along the top of the zigzags (Shyer et al., 2013). 
This previous study thus addressed the mechanism by which 
villi first form in the developing chick gut. However, this work 
begs the question of why proliferation suddenly drops at the 
tips of the folds at the zigzag stage and also leaves unan- 
swered the critical question of how stem cells are localized 
to the base of the villi as they form. These issues are the focus 
of this current study. 
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Figure 1. Intestinal Stem Cell Markers Are Expressed Uniformly in 
the Early Mammalian Embryo and Are Refined during Development 

(A) Lgr5-EGFP-positive cells in heterozygous mouse intestines from E12.5 
to E15.5. High-magnification views (below) show progressive restriction of 
expression from the villus tip. Sections from a littermate control lacking the 
knock-in allele (right column) show no GFP expression. 

(B) CD44 immunohistochemistry in mouse intestines from E12.5 to E15.5. 
High-magnification views (below) show similar progressive restriction of 
expression from the villus tip. 

(C) Sections of the intestine from two different PO mice that resulted from 
crossing the Lgr5 knock-in allele containing an inducible Cre with a Rosa26- 
TdTomato floxed reporter after tamoxifen induction at E13.5. GFP represents 
Lgr5 expression at PO, and tdTomato indicates the location of cells and their 
descendants that expressed Lgr5 during induction at E13.5. Scale bars, 
50 |j.m. 

RESULTS 

Intestinal Stem Cell Markers Are Expressed Uniformly in 
the Early Mammalian Gut and Are Refined during Villus 
Formation 

Although the definitive ISCs of the postnatal intestine are derived 
from the endoderm of the primitive gut tube and the early gut 
epithelium has been hypothesized to be a uniform stem-cell- 
like pool (Crosnier et al., 2006), it has remained unclear whether 
ISC markers are expressed at these early stages. To test this, we 
took advantage of a murine GFP knock-in allele of the best-stud- 



ied ISC marker, Lgr5 (EGFP-IRES-creERT2) (Barker et al., 2007). 
Strikingly, Lgr5-expressing cells are found throughout the 
epithelium in the embryonic day 12.5 (El 2.5) small intestine, 
just prior to villus formation (Figure 1 A). Over the following days 
of development, Lgr5 expression is lost in the forming villus 
tip and is progressively restricted to the space between villi as 
they form (Figure 1A). A second ISC marker, CD44 (Itzkovitz 
et al., 2012), follows a similar progression albeit with slightly 
delayed kinetics (Figure 1 B). 

In the adult intestine, canonical Wnt signaling is essential for 
maintaining ISCs. In previous studies, markers for active Wnt 
signaling, such as Sox9, have been reported to be initially ex- 
pressed uniformly throughout the embryonic gut but are then 
restricted to the intervillous space as villi form (Blache et al., 
2004; Formeister et al., 2009; Furuyama et al., 2011). Moreover, 
previous reports have shown that epithelial proliferation follows 
the same progressive restriction from the tip of forming villi 
(Crosnier et al., 2006). 

These data suggest that the ISCs localized at the base of the villi 
at birth are remnants of a broader precursor stem cell population 
found throughout the early gut endoderm. To directly test whether 
this is the case, we made use of the inducible Cre present in the 
Lgr5 knock-in allele and crossed it into the background of a 
Rosa26-tdTomato floxed reporter that is irreversibly activated in 
the presence of Cre recombinase, marking the cells in which 
Cre is expressed and also their descendants. We labeled cells 
by inducing Cre activity at El 3.5, a stage when the entire epithe- 
lium is proliferative and expresses Lgr5. We then sectioned guts of 
postnatal animals, a time when stem cells are localized to the base 
of the villi and to the inter-villus regions, and examined them for 
tdTomato expression. We observed staining at the base of the villi 
that colocalized with Lgr5 expression and, in many cases, also 
saw staining along the sides of the villi (Figure 1C) even though, 
at this stage, the epithelial cells of the villi do not actively express 
Lgr5. As the epithelial cells of the villi at this stage are known to be 
derived from the stem cells at their base, these data indicate that 
the embryonically labeled Lgr5-positive cells are indeed the pro- 
genitors of the post-natal intestinal stem cells. 

Although the villi in mouse appear to be established through 
similar compressive forces as in the chick (Shyer et al., 2013), 
they arise much more quickly and without the clear stepwise pro- 
gression seen in the chick (Figure SIB). To investigate when in 
this process the stem cells are localized, we therefore switched 
systems to the chick. 

Stem Cells Are Restricted Late in Chick Endodermal 
Morphogenesis as Zigzags Become Compact and Begin 
to Morph into Pre-villus Bulges 

As Lgr5 expression is difficult to detect in the developing chick 
midgut, we utilized single-molecule fluorescent in situ hybridiza- 
tion (FISH) to locate Lgr5-expressing cells across chick intestinal 
development. Lgr5 is expressed uniformly throughout the early 
embryonic intestinal epithelium and continues as such through 
the early stages of epithelial morphogenesis into ridges and 
zigzags (Figures 2A and S2A). However, by El 5, as the zigzags 
attain their maximal compaction just before they begin to morph 
into the bulges that will give rise to villi, Lgr5 expression is dimin- 
ished in the tip of the epithelial fold. By hatching, expression is 
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Figure 2. Restriction of Progenitor Identity Is Observed during the 
Slower Progression of Villus Formation in Chick 

(A) Quantification of singie LGR5 mRNA moiecuies per unit iength across the 
base, middie, and tip of epitheiiai foids over time (quantifications were done on 
at least three gut samples for each stage). Data are represented as mean ± 1 
SD. See also Figure S2. 

(B) Immunofluorescence for Sox9 in the chick intestine across development 
from E1 3 when expression is uniform in the epithelium through E1 5 when Sox9 
is restricted from the tips of the folds and at hatch when Sox9 is expressed 
predominantly in the intervillous space. Scale bars, 50 i^m. 



predominantly limited to the intervillous space (Figures 2A and 
S2A). Similarly, Sox9 is expressed uniformly in the early chick in- 
testinal epithelium and is lost at the tips of the folds by E1 5 (Fig- 
ure 2B). Thus, the localization of both putative stem cells and of 
the Wnt signaling that supports them becomes restricted just 
before the pre-villus bulges start to emerge. We have previously 
noted that this transition from zigzags to bulges also correlates 
with and, indeed, depends upon a progressive restriction of 
proliferation from the tips of the folded luminal surface E15 
(Shyer et al., 2013). 

A Signaling Center Correlating with the Timing and 
Localization of Stem Cells in the Forming Gut 

A potential clue for how epithelial proliferation and stem cell iden- 
tity might be regulated comes from the mouse, where lack of pro- 
liferation at the villus tip has previously been correlated with the 
presence of a signaling center in the distal mesenchyme of the 
nascent villi, called the “villus cluster” (Karlsson et al., 2000), 
which expresses PDGFRa, Glil, Ptcl, Bmp2, and Bmp4 (Karls- 
son et al., 2000;Walton et al., 2012). In the chick, we find that 
the same suite of genes is expressed at a high level in the equiv- 
alent location at the tip of the highly folded epithelium at El 5, 
although the same genes are expressed at a lower level at earlier 
time points in a narrow band directly under the entire epithelium 
(Figure 3A). The time when the villus cluster genes are upregu- 
lated is the same stage as when the overlying distal epithelium 
loses stem cell marker expression and as when proliferation de- 
creases in the distal domain of the epithelium (Shyer et al., 201 3). 



The chick villus cluster includes cluster-specific expression of 
Foxfl , a transcription factor implicated in villi formation (Ormes- 
tad et al., 2006) but not previously observed in the cluster, as 
well as PDGFRa, Ptcl , and Bmp4 (Figure 3A). We also examined 
phospho-SMAD staining, as a reporter of Bmp activity, during 
chick gut morphogenesis. Phospho-SMAD reactivity is identified 
with a timing that correlates with the onset of high-level Bmp 
expression in the villus cluster and negatively correlates with 
the localization of Lgr5 expression (Figure 3B). 

It has recently been shown that, in mouse, the villus cluster 
expression of Bmp4 and the general Shh target Ptcl are down- 
stream of hedgehog signaling (Walton et al., 2012; Ormestad 
et al., 2006). Moreover, it has long been known that, at earlier 
stages in chick gut formation. Sonic hedgehog (Shh) is respon- 
sible for inducing expression of Bmp4 in the underlying mesen- 
chynme (Roberts et al., 1995). Accordingly, we find that, in the 
chick, villus cluster-specific expression of Ptcl and BMP4, as 
well as Foxfl , is lost upon inhibition of Hedgehog signaling by cy- 
clopamine and expanded in response to additional Shh protein 
(Figure 4A). As expected, the decrease or increase of Bmp4 
expression, in response to cyclopamine or Shh, respectively, is 
reflected by a concomitant respective loss of or broadening of 
phospho-SMAD reactivity (Figure 4B). 

A Feedback Loop from the Villus Cluster to the 
Epithelium Localizes Stem Cells to the Base of the 
Forming Villi 

To test whether signals from the villus cluster, in fact, direct the 
fate of cells in the neighboring epithelium, we excised a small 
segment of intestine from an El 4 chick embryo, when progenitors 
are uniformly distributed and before the villus cluster has formed, 
and manipulated cluster signals in vitro during 36 hr of culture. 
Control cultures display strong Edu labeling, which is indicative 
of proliferation exclusively at the base of the fold, just like their 
El 5.5 in vivo counterparts (Figure 4C). However, culturing in the 
presence of the hedgehog inhibitor cyclopamine or the Bmp 
inhibitor Noggin results in expansion of proliferation throughout 
the endoderm, including the villus tips (Figure 4C). Conversely, 
in explants cultured in the presence Shh or Bmp4, proliferation 
is absent not just from the tips of the villi, but from the entire endo- 
dermal layer. As shown above, Shh activity is responsible for 
inducing Bmp4 expression in the underlying mesenchyme. To 
confirm this epistatic relationship in this context, we simulta- 
neously treated cultures with both Shh and Noggin. Application 
of both Shh and Noggin to gut segments in culture mimics the 
effects of Noggin alone, maintaining proliferation throughout the 
endoderm (Figure 4C). Thus, as expected, endodermally derived 
Shh activity is upstream of mesenchymal Bmp4 expression, and 
Bmp4 activity represses endodermal proliferation. 

Wnt signaling is an important niche signal for maintaining ISCs 
in the mature intestine. Moreover, mouse mutants with loss of 
villus cluster signals show an expansion of Wnt expression (Mad- 
ison et al., 2005; Ormestad et al., 2006), suggesting that the 
presence of Bmp signaling at the tips of the villi may lead to 
the observed loss of ISCs in the overlying epithelium by reducing 
Wnt activity. Blocking Shh or BMP signaling resulted in uniform 
staining of the Wnt target Sox9 throughout the gut epithelium, 
whereas control gut tissue only showed Sox9 expression in the 
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lower half of the villi (Figure 4D). Conversely, in explants cultured 
in the presence of Shh or Bmp4, Sox9 is absent in the endo- 
dermal layer (Figure 4D). 

To directly verify that this signaling cascade regulates ISC 
restriction, we assessed the expression of the ISC marker Lgr5 
in the presence of repressed Shh activity. As anticipated, when 
cyclopamine is added, abolishing villus cluster gene expression, 
the resulting intestine segments maintain expression of Lgr5 
throughout the folded epithelium, whereas expression is lost at 
the tip in control segments (Figures 4E and S2B). 

Together, these results support a model in which Shh activity 
in the gut endoderm induces villus cluster gene expression in the 
subadjacent mesenchyme at the tips of the villi. This signal cen- 
ter then produces Bmp4, which reciprocally feeds back on the 
endoderm to block Wnt activity and hence repress ISC identity 
and cell proliferation at the distal end of the growing villi. 

Physical Changes in the Morphology of the Lining of the 
Gut Create Local Maxima of Signaling Activity to Induce 
the Villus Cluster 

There is, however, an obvious problem with this model: we have 
shown that Shh is expressed uniformly throughout the gut endo- 



Figure 3. As the Proto-villi Form from E1 3 to 
E15, the Villus Cluster Signaling Center 
Forms in the Mesenchyme at the Distal Tip 

(A) Luminal views of the zigzag topography from 
E13 to E15, and expression of cluster genes goes 
from uniform under the wide folds of the epithelium 
at E13 (left) to predominantly localized to the 
mesenchyme under the forming villi at E15 (right). 

(B) PhophoSMAD staining demonstrates high 
BMP activity in the villus cluster and the adjacent 
epithelium. Close-up views (below) of a single fold 
at E15 highlight epithelial staining (arrowhead), 
which is less intense than staining in the mesen- 
chymal cluster. Scale bars, 50 |am. 



derm at the stages of development under 
consideration (Figures 3B and 4A), yet the 
putatively Shh-dependent villus cluster 
genes are only induced at the distal tips 
of the villi. A plausible model explaining 
this localized, elevated response to Shh 
takes note of the fact that a uniformly 
secreted protein will be at a higher con- 
centration in locations where the target 
tissue is surrounded by morphogen-pro- 
ducing tissue (e.g., at the curved tip of 
the highly folded epithelium) than where 
it is only adjacent to the source of the 
morphogen on one side (e.g., at the 
base of the folds). This is supported by 
computational modeling, which shows 
that a highly folded epithelium, or finger- 
like pocket, indeed results in both an in- 
crease in a morphogen concentration 
gradient and a greater depth of high-level 
signaling below the endoderm, relative to 
a similarly scaled, wider fold (Figure S3 and Extended Experi- 
mental Procedures). The slow, stepwise nature of villi formation 
in chick allows for a detailed investigation of this hypothesis for 
how the villus cluster arises. During the stages in which the 
lumen takes on an increasingly compact zigzag topography 
(E13, E14, and E15) we find that the cross-sectional shape of 
these structures changes in concert (low peak, narrow peak, 
and rounded tip, respectively) (Figures 5A and 5B). This would 
be predicted to lead to increasingly concentrated gradients 
of endodermally derived signaling at the tip (schematized in 
Figure 5B). 

To directly test this idea, we examined the distribution of Shh 
with an antibody directed against this protein. Anti-Shh staining 
intensity was plotted along a line from the tip of the folded epithe- 
lium and orthogonal to it (Figure 5D). Prior to E15, anti-Shh reac- 
tivity is identified in the epithelium and the mesenchyme just sub- 
jacent to the endoderm. However, at the transition from zigzags 
to bulges, the mesenchyme in the distal domain of the folded tis- 
sue showed significantly elevated Shh protein accumulation. In 
addition, the shape of the gradient tapers off much more slowly 
within the highly folded epithelium of the E1 5 gut than within the 
broader fold seen at E13. This is consistent with expectations. 
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since— in addition to Shh protein diffusing from the tip— the 
mesenchyme within the narrowly folded E15 epithelium is 
exposed to Shh secreted from the epithelium lateral to it, aug- 
menting the gradient. At both stages, the highest level of Shh 
staining is observed within the epithelium itself, which is to be 
expected as the antibody will detect both extracellular and intra- 
cellular protein in the tissue producing the Shh. Importantly, 
however, the level of Shh produced by the epithelium, averaged 
for the nine sections assayed at each time point, is equivalent at 
E13 and E15. To further verify that the architecture of the tissue 
affects Shh protein accumulation, we compared the concentra- 
tion of Shh protein 5 microns below the tip of the epithelial fold at 
E15 versus the concentration present at the same distance 
below the base of the fold. As expected, the intensity of staining 
is much higher within the fold, providing a mechanism explaining 
localized high-level Shh signaling at the epithelial tips. 



Figure 4. ISC Localization Is Regulated by 
BMP Signaling from the Underlying Mesen- 
chymal Villus Cluster Signaling Center 

(A) In situ hybridizations of E14 chick intestines 
cuitured for 36 hr without (controi) or with cy- 
ciopamine or recombinant Shh iigand. 

(B) PhosphoSMAD staining of cuitured sampies 
demonstrates the impact of compounds and 
recombinant proteins on BMP activity. 

(C) Edu iabeiing of E14 chick intestines cuitured 
for 36 hr with the iisted compounds and recom- 
binant proteins. Beiow: quantification of percent 
Edu-positive ceiis across the sub-regions of 
epitheiiai foids, and at ieast three foids on each of 
three sampies were counted. 

(D) Sox9 staining of cuitured sampies demon- 
strates the effect of compounds and recombinant 
proteins on Wnt activity. 

(E) Quantification of singie-moiecuie FiSH for 
LGR5 performed on sections from at ieast 3 E14 
chick intestines cuitured for 36 hr without (controi) 
or with cyciopamine. See aiso Figure S2. Data are 
represented as mean + 1 SD. Scaie bars, 50 i^m. 



If the mesenchyme responds to Shh 
by activating villus cluster genes at a 

i ^l I"" high threshold concentration, this would 

I I explain the observed localization of 

I I high-level villus cluster gene expression. 

Indeed, examination of the expression 
pattern of villus cluster markers such as 
PDGFa and Bmp4 gives results consis- 
tent with this model (Figure 5C). Consis- 
tent with epithelial morphogenesis acting 
upstream of increased Shh signaling and 
hence villus cluster gene activity, and not 
vice versa, after treating with cyclop- 
amine to block hedgehog signaling, we 
observed no alteration in the global struc- 
ture of the epithelium or in individual 
epithelial or mesenchymal cell shape, us- 
ing membrane-bound p-catenin to outline 
cell contours (Figure S4). 

To test whether the bending of the epithelium into more tightly 
curved domains, with consequent high levels of localized 
signaling, is indeed responsible for the upregulation of villus clus- 
ter genes in the tips of these structures, we undertook a simple 
experimental manipulation designed to “open” the normally 
tightly folded epithelium. Ringlets of embryonic intestine were 
excised at E14 and placed into culture in vitro. The folds in the 
epithelium arise due to constraint on the proliferating inner layers 
by subadjacent differentiated smooth muscle (Shyer et al., 
2013). To alter this physical constraint, half of the rings were 
turned inside out, putting the endoderm and mesenchyme 
outside of the rings of smooth muscle, allowing the epithelium 
more length to take on a less folded form (Figure 6A). Following 
36 hr of culture, the inside-out ringlets indeed had a broader con- 
tour than their right-side-out counterparts. After culture, the ring- 
lets were sectioned and processed for in situ hybridization with 
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Figure 5. Non-uniform Mesenchymal Sig- 
nals Are Downstream of Uniform Epithelial 
Signal 

(A) Luminal views of the chick intestine from E1 3 to 
E15, as progenitor identity is lost from the tips of 
the folds (also shown in Figure 3A). Dotted lines 
represent the plane of section for transverse views 
in (B)-(D). 

(B) Schematic of diffusion of signal from an 
epithelium of the particular shape at each 
stage; darker color represents more signal. 
Note the increasing signal overlap in the underly- 
ing mesenchyme as the fold narrows. See also 
Figure S3. 

(C) In situ hybridization for Bmp4 (above) PDGFRa 
(below) expression from E13 to E15 matches the 
predicted pattern in (B) (also shown in Figure 3A). 

(D) Distribution of Shh protein in folded tips of the 
chick intestine at E13 and E15 (left). Antibody 
staining intensity across the 100 |am region boxed 
on the left was quantified using the Plot Profile 
function in Fiji (right). Brightness values were 
normalized to background levels for each image. 
A comparison of Shh staining intensity in El 3 
(graphed in blue) versus El 5 (graphed in red) 
shows increased Shh staining in the El 5 mesen- 
chyme (dotted line denotes epithelial-mesen- 
chymal border). The staining intensities across the 
El 3 and El 5 epithelia are not significantly different 
(p < 0.08). Three different z slices from each of 
three samples were averaged for each stage. 
Below, the staining intensity found in a 5 |am by 
5 lam region that is 5 ^im from the El 5 tip epithe- 
lium (pink) is significantly brighter than in the 
same-sized region 5 ^im from the El 5 base 
epithelium (yellow) (p < 0.001). Measurements 
from two different z slices from each of three 
samples were averaged for each El 5 region. Data 
are represented as mean ± 1 SD. Scale bars, 
25 lam. 
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various villus cluster probes (Ptc1 , Bmp4, PDGFRa, and Foxf1). 
Each of these was strongly expressed at the fold tips of the 
control ringlets, but all were expressed uniformly at a lower level 
under the epithelium in the inside-out ringlets (Figure 6B). More- 
over, phospho-SMAD staining, indicative of Bmp upregulation in 
villus clusters, is also greatly diminished in the inside-out ringlets 
relative to control cultures (Figure 6B). These results suggest that 
the villus cluster forms in the mesenchyme at the tip of the fold 
because those cells are almost completely encapsulated by 
Shh-expressing epithelium, allowing high threshold responses 
to be activated. 

Preventing villus cluster formation by flipping the intestines 
inside out results in an absence of the localized Bmp signal 
that we demonstrated is responsible for restricting ISC localiza- 
tion within the gut epithelium. Thus, the inside-out ringlets of guts 
would be expected to maintain stem cell properties and prolifer- 
ation throughout their epithelium. Indeed, such manipulations 



lead to maintenance of uniform prolifera- 
tion throughout the epithelium, whereas 
proliferation is lost in the epithelium sur- 
rounding the cluster that forms in control rings (Figure 6B). Simi- 
larly, uniform expression of the Wnt target Sox9 and the ISC 
marker Lgr5 is maintained in the inside-out guts lacking villus 
cluster gene expression, whereas it is restricted from the folded 
tips in controls (Figures 6B and S2C). 

As a second way of preventing late stages of epithelial 
morphogenesis, we took advantage of a drug, FK506, that has 
been shown to block smooth muscle differentiation (Fukuda 
et al., 1 998). As we previously showed (Shyer et al., 201 3), differ- 
entiation of smooth muscle layers is necessary for generation of 
the compressive forces that buckle the endoderm into ridges, 
zigzags, and then villi. We cultured guts in vitro from ridge stage 
to late zigzag stage, with or without the presence of FK506. 
Consistent with the results described above, without longitudinal 
muscle differentiation, and hence without progressing beyond 
parallel ridges, the entire endoderm remains proliferative, and 
villus cluster genes are never upregulated. As in vivo, control 
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cultures display restricted distal proliferation and activation of 
villus cluster gene expression (Figure S5). 

These results indicate that the three-dimensional folding of the 
epithelium is necessary to locally increase Shh signaling (as seen 
in Ptc1 expression) and induce the villus cluster genes. To see if it 
is also sufficient, we sought to create villus-like structures at a 
stage when the epithelium is normally not as tightly folded. Slabs 
of embryonic gut were excised at E1 0, when the gut is folded into 
several wide ridges, and placed into culture in vitro. Half of the 
slabs were placed under a fine grid, causing the luminal surface 
to fold, with continued growth, into many small villus-like bumps, 
long before endogenous villus formation takes place (Figure 6C). 
Slabs were cultured for 36 hr and then processed for in situ 
hybridization with various villus cluster probes (Ptc1, Bmp4, 
Foxf1 , and PDGFa). After 36 hr in culture, there is no change in 
expression of Shh itself, which continues to be expressed 
uniformly in the epithelium under these conditions (Figure 6D). 
However, while villus cluster gene expression in control 
segments is nearly uniform at a low level under the epithelium, 
samples grown under the grid display elevated expression in 
the mesenchyme under areas of highest epithelial curvature. 
PhosphoSMAD staining is observed in the same locations re- 
flecting the change in BMP pathway activity (Figure 6D). There- 
fore, simply morphing the tissue into the necessary shape can 
induce villus cluster-like local maxima of Shh responsive genes. 
Further, whereas proliferation and Sox9 expression are uniform 
in the control epithelium, in the samples cultured under the 
grid, proliferation and Sox9 expression are lost from the tips of 
the folds, surrounding the areas where mesenchymal expression 
of cluster genes is highest (Figure 6D). 

Taken together, these results demonstrate that the villus clus- 
ter genes are induced at local maxima of Shh activity, resulting 
from the additive effect of signaling that is compounded through 
the folding of the overlying epithelium. 

Villus Formation in the Mouse 

To examine the universality of the mechanism we have 
described, we returned to the developing mouse gut. As previ- 
ously described (Sbarbati, 1982; Walton et al., 2012; Shyer 
et al., 2013), the villi of the embryonic mouse gut form directly 
within the lumen without going through intermediate ridge and 
zigzag stages of epithelial folding. A critical question, in terms 
of the model we derived from the chick, is whether the epithelium 
buckles prior to expression of the villus cluster genes in mouse. 
To address this, we serially sectioned El 4.5 mouse guts and 
carefully examined each section. This is the stage when villi first 
arise in the mouse midgut, forming in a rostral to caudal progres- 
sion. Thus, at this stage, the caudal-most region of the small in- 
testine exhibits no epithelial projections (Figure 7A). Consistent 
with our previous studies showing that smooth muscle differen- 
tiation is required for villus formation (Shyer et al., 201 3), we also 
see no evidence of the longitudinal smooth muscle in this 
domain, using smooth muscle actin (SMA) as a marker (Fig- 
ure 7A). More rostrally, we see the first buckling of the endoderm 
into small “alcoves,” concomitant with the first appearance of 
the longitudinal smooth muscle staining (Figure 7B). However, 
careful examination of serial sections fails to detect any sign of 
expression of upregulation of the villus cluster gene PDGFRa 



at this rostrocaudal level (Figure 7B). It is only when one moves 
still further rostrally that one sees deeper alcoves displaying 
strong PDGFRa expression at their tips (Figure 7C).Thus, epithe- 
lial morphogenesis precedes villus cluster gene activation. 
These descriptive data are at least consistent with the activation 
of villus cluster gene expression in the mouse being a conse- 
quence of higher level Shh signaling in pockets of buckled 
epithelium. 

To directly test whether changing the architecture of the 
epithelium would affect villus cluster gene expression in the 
mouse, we returned to the experiment, creating premature 
pseudo-villi in the mouse gut by forcing growth through a fine 
grid at El 3.5, prior to epithelial buckling. As in the chick, 
following 24 hr of incubation, the luminal surface folded into 
many small villus-like bumps extending through the holes in 
the grid. Whereas control guts did not show any signs of villus 
cluster gene expression following culture, samples grown under 
the grid showed strong upregulation of PGFRa at the tip of each 
pseudo-villus (Figure 7D). 

As described above, both proliferation and the stem cell 
marker Lgr5 are restricted from the tips of the forming mouse villi 
once villus cluster gene expression is activated. To see whether, 
as in chick, this is due to high-level Shh signaling, we cultured 
developing mouse guts in vitro and blocked the Shh pathway 
with cyclopamine. Cyclopamine treatment was sufficient to 
expand both proliferation and expression of stem cell markers, 
CD44, Sox9, and Lgr5, in the tips of the forming villi in the treated 
guts, whereas control guts cultured in the absence of cyclop- 
amine appeared similar to their in vivo counterparts (Figure 7E). 

These data support the hypothesis that, as in chick, it is 
mechanical deformation of the gut epithelium that leads to high 
concentrations of Shh, hence induction of villus cluster genes 
in the mesenchyme and consequent restriction of stem cells in 
the underlying endoderm. 

DISCUSSION 

Our study has elucidated a series of steps integrating physical 
morphogenesis of the gut epithelium with restriction of stem cells 
to the base of the forming villi. Shh expressed by the endoderm is 
concentrated toward the tips of the buckling epithelial layer 
because of the repositioning of the source of the signal to sur- 
round the distal mesenchyme. This results in the induction of a 
signaling center, the villus cluster, as a high-threshold response. 
Bmp activity, emanating from the villus cluster, acts to oppose 
Wnt signaling and thereby leads to the sequestering of Wnt-sup- 
ported proliferative ISCs to the base of the villi. 

Localization of ISCs in Mice 

Intriguingly, although the intestinal lining of both birds and euthe- 
rian mammals is characterized by the presence of long finger- 
like villi, this morphology appears to have evolved convergently, 
as the gut morphology of lower animals, including fish (Walker 
et al., 2004), amphibians (McAvoy and Dixon, 1978), reptiles 
(Ferri et al., 1976; Kotze and Soley, 1995), and even monotremes 
(Krause, 1975), include various forms of ridges and folds to in- 
crease the surface area of the lining of the gut, but not individual 
villi. The tight packing and long projections of individual villi that 
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Figure 6. Epithelial Shape Directs Cluster Formation 

(A) Experimental schematic: a ring of E14 intestine (left) is cultured for 36 hr either as a control segment or after first being flipped inside out (right). 

(B) After 36 hr in culture, the cluster signal arises in the control rings (top), similar to what would be found in an E15 intestine. The rings that were flipped inside 
out before culture have an epithelial shape similar to E13 intestine and, concomitantly, an in situ pattern and phosphoSMAD staining that matches expression at 
E13. Proliferation (quantified as in Figure 4), Sox9 expression, and Lgr5 expression are all lost from the tips of folds that form in the control rings. See also 
Figure S2. 

(C) Experimental schematic: a slab of E1 0 intestine (left) is cultured for 36 hr either as a control segment (where wide ridges will be maintained) or under a fine grid 
that induces many small villi-like bumps (right). 



(legend continued on next page) 
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represent an optimized solution for increasing surface area 
(hence allowing maximal absorption of nutrients) may have 
been selected for independently in the two most highly metabolic 
lineages, mammals and birds. The stepwise progression of 
mucosal folds from ridges to zigzags to villi has been well 
described in the chick (Coulombre and Coulombre, 1958). A 
similar series of transitions, involving segmentation of pre-villus 
ridges to form villi, has been described for several mammals, 
including cattle (Winkler and Wille, 1998) and humans (Hil- 
ton, 1902; Lacroix et al., 1984). In striking contrast, the villi of 
the murine intestine form directly from the floor of a smooth 
epithelium (Sbarbati, 1982). The process of villus formation in 
the mouse does, nonetheless, share at least some mechanistic 
aspects with the chick and other guts where villi form via seg- 
mentation. In both chick and mouse embryonic gut, villus forma- 
tion is prevented by blocking differentiation of the smooth 
muscle (which, at least in chick, acts as a barrier to expansion 
of the epithelium, thereby causing mucosal buckling). Moreover, 
modeling of the physical properties of the embryonic mouse in- 
testine indicates that compressive mechanical forces induced 
by constrained growth are sufficient to explain the emergence 
of villi in mice as in chick (Shyer et al., 201 3). 

Consistent with this, we found that, concomitant with smooth 
muscle differentiation, the mouse epithelium buckles into small 
alcoves that could, in principle, lead to local elevated concentra- 
tions of Shh protein prior to the onset of villus cluster gene 
expression. As in the chick, stem cell markers and Wnt-respon- 
sive genes are expressed uniformly throughout the gut epithe- 
lium prior to this point and are downregulated at the tips of the 
forming villi as the villus cluster genes are expressed. Also, as 
in the chick, blocking the Shh pathway, and thus downstream 
BMP signaling, is sufficient to expand proliferation and the 
expression of Lgr5, suggesting that the presence of Shh 
signaling normally acts to restrict them from the villus tips. 
Finally, creating villus-like structures prematurely results in the 
upregulation of a marker of the villus cluster through geometric 
constraint. Although the central features involved in gut stem 
cell localization during villus formation, thus, appear to be the 
same in mice and chicks, there is some evidence that there 
may be differences as well. For example, formation of the villus 
cluster in the mouse appears to involve cell aggregation (Walton 
et al., 201 2), as well as induction of gene expression, a feature we 
have not observed in the chick. Further work will be required to 
gain a fuller picture of how villus formation and stem cell location 
are achieved in mice and to integrate other findings with the 
results described here. 

Bmp Antagonism of Wnt Activity in Restricting 
Proliferation and Stem Cell Activity 

Our data show that the net result of the Shh-Bmp signaling 
cascade is a restriction of proliferation, as well as a decrease 
in expression of Wnt-dependent stem cell markers at the tips 



of the developing epithelial folds. We did not, in the context 
of this study, explore how this is achieved. However, a similar 
Bmp antagonism of Wnt activity has previously been described 
in the context of the adult intestinal stem cell niche. As we 
observed embryonically. Bmp ligands are also strongly pro- 
duced by the inter-villus mesenchyme near the tips of the adult 
villi with a decreasing gradient toward the crypts (He et al., 
2004; Hardwick et al., 2004; Haramis et al., 2004; Batts et al., 
2006). Moreover, this Bmp activity in the adult villus acts to sup- 
press Wnt signaling to control the balance of stem cell renewal 
and differentiation (He et al., 2004). In this context, the Bmp 
and Wnt pathways are integrated intracellularly at the level of a 
PTEN/Akt-dependent mechanism (Tian et al., 2005). It seems 
likely that this same or a similar mechanism is employed down- 
stream of Bmp activity at the earlier stage investigated here. 

Mechanically Based Induction of Gene Expression 

The physical reshaping of morphogenic gradients represents an 
intriguing paradigm in the integration of mechanics and develop- 
mental signaling. Of course, in addition to this mechanism, 
many instances have been described wherein forces impact 
gene expression through mechanosensory signal transduction. 
In a formal sense, it is certainly possible that mechanosensory 
signaling also contributes to the activation of target gene expres- 
sion during gut epithelial morphogenesis. However, we empha- 
size that ectopic action of Shh is sufficient to induce villus cluster 
gene expression and to restrict the location of stem cells and 
proliferation, while blocking Shh activity is sufficient to result in 
a loss of villus cluster gene expression and expansion of prolifer- 
ation and stem cell localization. Moreover, addition of cyclop- 
amine has no effect on the contour of the epithelium or the shape 
of individual epithelial cells (Figure S4). As the epithelium is bent 
equivalently under conditions with or without cyclopamine, 
the cells should be seeing equivalent strains and stresses, and 
hence similar mechanosensory signaling. Yet the cultures with 
cyclopamine lose villus cluster gene expression, whereas control 
cultures do not, clearly indicating that there is at least a major 
part of the process that is independent of mechanosensory 
transduction. 

Initiation of Discrete Signaling Centers 

Mesenchymal-epithelial crosstalk is an established principle 
in developmental biology— for example, the positive feedback 
loop between the mesenchymal zone of polarizing activity 
(ZPA) and epithelial apical epidermal ridge (AER) in limb develop- 
ment (Lauferet al., 1994; Niswanderetal., 1994) or the reciprocal 
epithelial-mesenchymal signaling in tooth germ formation (The- 
sleff, 2003). A number of mechanisms have been described for 
establishing the localized signaling centers necessary for such 
interactions. These include reliance on upstream positional infor- 
mation, such as the posterior pre-pattern of Hox gene expres- 
sion necessary to establish the mesenchymal ZPA signaling 



(D) After 36 hr in culture, the cluster gene expression and phosphoSMAD staining in control segments is nearly uniform under the epithelium. However, samples 
grown under the grid form villi-like bumps and display non-uniform expression of cluster genes and BMP activity with highest expression in areas of highest 
curvature. Proliferation and Sox9 expression are uniform in the control epithelium, but in the samples cultured under the grid, proliferation and Sox9 expression 
are lost from the tips of folds that form particularly in areas where the curvature is highest and where clusters of mesenchymal expression arise. Data are 
represented as mean ± 1 SD. Scale bars, 50 |am. 
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Figure 7. Epithelial-Mesenchymal Signaling in a Deforming Field 
Drives Localization of Intestinal Stem Cells in Mouse 

(A) The caudal-most region of the small intestine exhibits no epithelial pro- 
jections and no evidence of the outer, longitudinal smooth muscle in this 
domain, using SMA as a marker. 

(B) More rostrally, the first buckling of the endoderm is observed concurrent 
with the first appearance of the longitudinal smooth muscle staining; however, 
no cluster expression of PDGFa at this rostrocaudal level is seen, demon- 
strating that epithelial morphogenesis precedes villus cluster gene activation. 

(C) Even more rostrally, where additional longitudinal smooth muscle differ- 
entiation has occurred, deeper alcoves display strong villus cluster gene 
expression at their tips. Close-up views of the developing outer, longitudinal 
smooth muscle layer (arrowheads) are shown below. 

(D) Villus-like structures were generated through constraint with a mesh grid, 
resulting in the upregulation of the villus cluster marker PDGFRa when 
compared to control cultures grown without the grid. 

(E) Application of cyclopamine to E14.5 mouse guts grown in culture for 30 hr 
results in maintenance of progenitor identity at the tips of forming villi. Prolif- 
eration (Edu), Wnt responsiveness (Sox9), and stem cell markers (CD44 and 



center in the limb (Charite et al., 1 994; Knezevic et al., 1 997), and 
lateral inhibition such as that seen in setting up the spacing of the 
enamel knot signal centers in tooth bud development (Salazar- 
Ciudad, 2012). However, the work here highlights a different 
mechanism involving the use of a uniformly produced signal, 
concentrated not by diffusion or feedback loops but by physical 
deformation of the morphogenic field. Employing the shape 
changes of the developing tissue to dictate where signals arise 
artfully links the process of building a structure with the proper 
placement of its molecularly defined cell types. In the case of 
the intestine, this mechanism assures that specialized cells, 
like ISCs, end up in the right location at the base of each villi 
as these structures take shape. Recently, it has been shown 
that tissue architecture can similarly concentrate signaling in 
the context of the developing zebrafish lateral line (Durdu et al., 
2014), although, in this instance, the mechanisms that create 
the luminal pockets where morphogens can accumulate remain 
unclear and may not be related to upstream physical forces. 
Together, these studies suggest that local trapping of a broadly 
secreted signal may be a mechanism that is widely employed in a 
variety of embryological contexts. 

Finally, this study elucidates the embryonic origin of the 
localized adult intestinal stem cells. Because the origins of 
most adult stem cell populations are still unknown, our findings 
compel investigation into potential embryonic origins for other 
adult stem cells. 

EXPERIMENTAL PROCEDURES 

Embryos and Dissections 

Fertile chicken eggs (White Leghorn eggs) were obtained from commercial 
sources. Eggs were incubated at 37.5°C. Timed pregnant CD1 mice were 
obtained from Charles River. 

Immunohistochemistry and Edu Staining 

Small intestines were collected from embryos at desired stages and fixed in 
4% paraformaldehyde in PBS and embedded in OCT, allowing for 14 |am 
transverse sections of the gut tube. CD44 immunohistochemistry was per- 
formed with rat anti-CD44 (v6) (1:100 Biosciences) and detected using the 
Anti-Rat HRP-DAB Cell & Tissue Staining Kit (R&D Systems). The following an- 
tibodies were used for immunofluorescence staining at the listed concentra- 
tions: Sox9 (1:100, R&D Systems), (3-catenin (1:100, Sigma), PDGFa (1:100 
in chick, 1 :300 in mouse, Santa Cruz), FITC-conjugated smooth muscle actin 
(1:100, Abeam), phospho-SMAD 1/5 (1:300, Cell Signaling), and Shh (5E1, 
1 :20). Sections were incubated with primary antibody overnight at 4°C degrees 
and then incubated with Alexa secondary antibodies used at 1 :300 for 2 hr at 
room temperature. DAPI (molecular probes) was used as a nuclear counter 
stain. 100 i^M Edu (Invitrogen) was added to guts in culture, and samples 
were harvested after 4 hr of Edu incubation. Edu was detected in sectioned 
tissue using the Click-iT Edu system (Invitrogen). 

In Situ Hybridization and Singie-Moiecuie FISH 

Tissue samples for section in situ hybridization were fixed overnight in 4% 
PFA. After fixation, the tissue was rinsed in PBS and incubated in 30% sucrose 
overnight at 4°C before being embedded in OCT. 14-|am-thick cryosections 
were collected for DIG-labeled RNA in situ and 10-|am-thick sections were 
collected for single-molecule FISH. DIG-labeled in situ were performed as 
described previously (Brent et al., 2003). Single-molecule FISH experiments 



LGR5) are all found along the folded epithelium (arrowhead) when cluster 
signals are blocked. Control segments show proper restriction to the base of 
folds. Scale bars, 50 lam. 
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were performed according to Raj et al. (2008) and Itzkovitz and van Oudenaar- 
den (2011). 

Organ Culture 

Chick intestines were dissected from the embryos of the desired stage in coid 
PBS, connective tissue was removed, and segments of intestines were piaced 
on transweiis (Costar 3428) or floating above an agar base in DMEM media 
supplemented with 1 % pen/strep and 1 0% chick embryonic extract. Chick in- 
testines were cultured for 36 hr (or as indicated in the figure legends) at 37°C 
with 5% CO 2 . Inside-out intestines were obtained by gently coaxing a ring of 
intestine to invert with forceps. To generate guts with artificial villi, segments 
of intestine were harvested from E10 embryos, when several ridges are pre- 
sent. These segments were sliced open to create a slab of intestine that was 
placed lumen side up on a transwell. A small piece of fine mesh was placed 
gently on top of the slab to induce villi-shaped bumps in culture. Mouse 
intestines were dissected from embryos and cultured in DMEM media supple- 
mented with 1% pen/strep and 20% FBS in a BTC Engineering rotating 
incubator with 95% O 2 . Recombinant ligands: Shh (4 |ig/ml; R&D Systems) 
and BMP (1 |xg/ml R and D Systems), and Inhibitors: cyclopamine (10 ^iM 
EMD Biosystems) and Noggin (1 |xg/ml R and D Systems) FK506 (1 0 |iM Sigma) 
were added at the beginning of culture. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures and 
five figures and can be found with this article online at http://dx.doi.org/10. 
1 01 6/j.cell.201 5.03.041. 
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SUMMARY 

Understanding how functional lipid domains in live 
cell membranes are generated has posed a chal- 
lenge. Here, we show that transbilayer interactions 
are necessary for the generation of cholesterol- 
dependent nanoclusters of GPI-anchored proteins 
mediated by membrane-adjacent dynamic actin 
filaments. We find that long saturated acyl-chains 
are required for forming GPI-anchor nanoclusters. 
Simultaneously, at the inner leaflet, long acyl-chain- 
containing phosphatidylserine (PS) is necessary for 
transbilayer coupling. All-atom molecular dynamics 
simulations of asymmetric multicomponent-mem- 
brane bilayers in a mixed phase provide evidence 
that immobilization of long saturated acyl-chain 
lipids at either leaflet stabilizes cholesterol-depen- 
dent transbilayer interactions forming local domains 
with characteristics similar to a liquid-ordered (lo) 
phase. This is verified by experiments wherein 
immobilization of long acyl-chain lipids at one leaflet 
effects transbilayer interactions of corresponding 
lipids at the opposite leaflet. This suggests a general 
mechanism for the generation and stabilization of 
nanoscale cholesterol-dependent and actin-medi- 
ated lipid clusters in live cell membranes. 

INTRODUCTION 

The plasma membrane of living cells is the barrier that segre- 
gates the inside of the cell from the outside. It is a fluid bilayer 
composed primarily of lipids and proteins. It has long been 
thought of as an equilibrium mixture giving rise to a “fluid 
mosaic” (Singer and Nicolson, 1972), wherein proteins and lipids 
form regions of distinct composition driven by thermodynamic 

CrossMark 



forces. Additionally, liquid ordered (lo) -disordered (Id) phase 
segregation of lipids was expected to give rise to membrane 
“rafts” (Simons and Vaz, 2004). These rafts, in turn, were hypoth- 
esized to facilitate a number of cellular functions such as the 
sorting of specific membrane components for the building of 
signaling complexes, construction of endocytic pits, and transbi- 
layer communication (Simons and Ikonen, 1997). 

Because the cell membrane contains a diverse array of 
lipids with varying acyl chain length/saturation and significant 
levels of cholesterol, even if the cell membrane is globally 
mixed and homogeneous at physiological temperatures, it 
could exhibit small, transient regions with local /o-like char- 
acter. Indeed, studies using local probes, spin-labeled lipids 
and electron-spin resonance techniques report deuterium or- 
der parameters consistent with the existence of a fraction of 
membrane lipids exhibiting /o-like conformations (Swamy 
et al., 2006). However, macroscopic domains are rarely seen 
in live cells. Studies on the phase behavior of giant plasma 
membrane-derived vesicles from a number of cell types 
show that large phase segregated domains form only when 
these membranes are cooled to temperatures well below 
physiological temperature (Baumgart et al., 2007) or if some 
of the membrane components are artificially clustered (Kaiser 
et al., 2009). 

The simple equilibrium picture of phase segregation of mem- 
brane composition and order runs into several problems. First, 
the plasma membrane is an asymmetric multicomponent bilayer; 
our understanding of phase behavior, local composition hetero- 
geneity, and transbilayer coupling in such systems is preliminary 
(Polley et al., 2012, 2014). Second, the plasma membrane is 
attached to an actin cortex, whose role in influencing local mem- 
brane composition is poorly understood. Finally, the organization 
and dynamics of a variety of plasma membrane molecules such 
as membrane proteins (Gowrishankar et al., 2012; Jaqaman 
et al., 2011), lipid-anchored proteins (Goswami et al., 2008; Prior 
et al., 2003; Sharma et al., 2004), and glycolipids (Fujita et al., 
2007) into nanometer sized clusters cannot be derived from 
equilibrium-based mechanisms. 
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Figure 1 . Long-Saturated-Acyl-Chain-Containing GPIs Support Nanoclustering 

(A-E) Chemical structures of synthetic minimal GPI analogs (A) outline the variation in the di-acyl glycerolipid chain length (C16:0 and C8:0) and saturation 
(C18:0 and C18:1) used in this study. GIcNPI carry fluorescent labels BODIPY™^ (left) or fluorescein (right). Representative gray scale images of CHO (IA2.2F) 
cells with exogenously incorporated GPI analogs as indicated are shown below each analog. Fluorescence anisotropy of GPIci6:o/ci6:o (red closed diamonds) or 

(legend continued on next page) 
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Studies on glycosylphosphatidylinositol (GPI)-anchored pro- 
teins (GPI-APs), a large class of plasma membrane proteins 
located at the exoplasmic (outer) leaflet (Gowrishankar et al., 
2012), in particular have demanded a new framework for 
understanding the local control of molecular organization at the 
cell surface. Homo-fluorescence resonance energy transfer 
(FRET)-based fluorescence anisotropy measurements (Sharma 
et al., 2004; Varma and Mayor, 1998), near-field scanning micro- 
scopy (van Zanten et al., 2009), and photoactivation localization 
microscopy (Sengupta et al., 2011) show that ~20%-40% of 
GPI-APs on the membrane are present as nanoclusters, 
whereas the rest are monomers. Other studies have shown 
that monomers are in continuous exchange with relatively immo- 
bile nanoclusters (Goswami et al., 2008; Sharma et al., 2004). 
This organization requires both adequate membrane cholesterol 
and actin dynamics (Goswami et al., 2008). 

GPI-AP clusters are formed by the active engagement of dy- 
namic actin adjacent to the membrane cortex and exhibit un- 
usual properties related to their spatial distribution, small size, 
temperature-independent fragmentation and formation kinetics, 
and non-Brownian density fluctuations (Goswami et al., 2008; 
Gowrishankar et al., 2012). These properties have been ex- 
plained by a theoretical framework (Chaudhuri et al., 201 1 ; Gow- 
rishankar et al., 2012) based on active contractile mechanics 
(Marchetti et al., 2013) of dynamic polar filaments. This frame- 
work also makes predictions that have been experimentally veri- 
fied (Gowrishankar et al., 2012). In this mechanism, dynamic 
actin forms transient contractile regions at the cytoplasmic (in- 
ner) leaflet that drive the clustering of the outer leaflet GPI-APs, 
as well as transmembrane proteins that directly associate with 
actin filaments. 

The actin-driven clustering of GPI-APs requires a coupling of 
the lipid-tethered protein across the bilayer to the dynamic con- 
tractile actin configurations at the inner leaflet. Furthermore, un- 
derstanding the mechanism of formation of these clusters has a 
functional significance, both in the sorting of GPI-APs (Mayor 
and Pagano, 2007; Mayor and Riezman, 2004) and in modulating 
receptor signaling (Coskun et al., 2011). For example, choles- 
terol-dependent GPI-AP nanoclustering is necessary for pro- 
moting integrin function (van Zanten et al., 2009), which appears 
to take place in focal adhesions that are surrounded by lo do- 
mains (Gaus et al., 2006). 

Here, we show that this transbilayer coupling requires the 
long acyl chains of outer leaflet GPI anchors in association 
with cholesterol and inner leaflet lipids that also carry long 
acyl chains. We identify phosphatidylserine (PS) as the inner 
leaflet lipid responsible for this coupling. All-atom molecular dy- 
namic (MD) simulations show that local transbilayer coupling 
occurs even in membranes that are well above their main tran- 
sition temperature, provided the long-acyl-chain-containing 



lipids are immobilized at one leaflet of the bilayer. We show 
that this immobilization may be mediated by PS binding to actin 
by constructing a synthetic linker that links PS to actin. Expres- 
sion of this linker in cells results in coupling of exogenously 
added lipids with long acyl chains, as well as endogenous 
GPI-APs, to stable long-lived actin structures located at the in- 
ner leaflet. 

This supports the idea that dynamic actin filaments at the inner 
leaflet may have the capacity to immobilize lipids and stabilize 
local lo domains over significant timescales in membranes at 
physiological temperatures. 

RESULTS 

Synthetic Fluorescent GPI Analogs with Long Saturated 
Acyl Chains Mimic GPI-AP Nanoclustering 

In mammalian cells, a typical GPI anchor is a complex glycolipid 
(McConville and Ferguson, 1993), which, in general, possesses 
long saturated acyl chains either Cl 6:0 or Cl 8:0 (Figure SI A). 
To test whether the acyl chain length and degree of saturation 
of the GPI anchor affect nanocluster formation, we generated 
synthetic GPI analogs (Figure 1 A) and studied their nanocluster- 
ing ability after their incorporation into the plasma membrane of 
live cells. The synthesized GPI analogs carry a minimal GPI an- 
chor containing the disaccharide glucosamine-inositol linked to 
phosphatidic acid (GIcNPI) (instead of the full-length GPI; Fig- 
ure SI A). Each analog is conjugated to fluorescent probes to 
the GPI head group (Figure 1A). The incorporated GPI analogs 
are retained in the outer leaflet of the plasma membrane 
as indicated by a complete loss of their fluorescence when 
subjected to phosphatidylinositol-specific phospholipase-C 
(PI-PLC) cleavage (Figure SIB). Additional confirmation of their 
correct membrane incorporation comes from the observation 
that their diffusion properties also resemble endogenous GPI- 
APs (Figures SIC and SID). 

Monitoring the fluorescence anisotropy of these synthetic 
fluorescent GPI analogs— differing only in their acyl chain length 
as a function of their concentration in the live-cell membrane— 
provides a measure of the extent of clustering of these analogs 
(Figure S1E). The fluorescence anisotropy of GPIci6:o/ci6:o is 
much lower than GPIc8:o/c8:o (Figures 1B and S2B). It exhibits 
concentration-independent fluorescence anisotropy over a large 
concentration range (Figure IB), similar to endogenous GPI-APs 
(Sharma et al., 2004). Consistent with the generation of nano- 
scale clusters, the photobleaching profiles of the GPIci6:o/ci6:o 
also mimicked those of fluorescently tagged folic acid analog 
(PLB-FR-GPI; Figures S1F and S1G). Furthermore, on depleting 
membrane cholesterol or on disrupting actin activity (blebs 
devoid of actin were generated by treatment with jasplankinolide 
[Goswami et al., 2008]), GPIci6:o/ci6:o exhibited an increase in 



GPIci8:o/ci8:o (green closed diamonds) in comparison to GPIc8:o/c8:o (red open diamonds) or GPIci8:i/ci8;i (green open diamonds) determined from images as 
above were plotted against a wide range of intensity of fluorescent GPI analogs available at the membrane of live cells. Scale bar, 20 ^im (B, C). Cumulative 
frequency distributions (CFD) derived from data derived from identical intensity ranges of GPIci6:o/ci6:o arid GPIc8:o/c8:o (D) or GPIci8:o/ci8:o arid GPIci8;i/ci8:i (E) 
incorporated into cells show the effect of cholesterol depletion by saponin treatment (sap; black line) or on blebs prepared by treatment with jasplakinolide (jas, 
blue lines) with respect to untreated cells (control, red lines). Each data point in the graphs represents average anisotropy with SD for the corresponding intensity 
bin obtained from a 10x10 pixel region (20-50 regions per cell) from at least 40 cells from 2 independent experiments. Error bars represent SD. See also Figures 
SI and S2. 
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fluorescence anisotropy (Kolmogorov-Smirnov [KS] test p < 
0.001) (Figures ID and S2B) similar to PLB-FR-GPI (Figures 
S2A, S2C, and S2D), suggesting cholesterol and actin-depen- 
dent nanoclustering. By contrast, GPIc8:o/c8:o exhibited a higher 
fluorescence anisotropy, which did not change upon photo- 
bleaching (Figure S1H), cholesterol depletion (Figures ID and 
S2B), or actin perturbation, which is consistent with its inability 
to form nanoclusters in the cell membrane. 

Comparable to PLB-FR-GPI, the fluorescence anisotropy of 
GPIci6:o/ci6:o changes upon photobleaching (Figures S1F and 
SI G), exhibiting a distinct rise in control cells or on the flat mem- 
brane (Figures 1 D, S2A-S2D, and SI I). Moreover, the fluores- 
cence anisotropy of both PLB-FR-GPI and the GPIci6:o/ci6:o 
analog remain unchanged upon photobleaching the bleb fluo- 
rescence (Sharma et al., 2004) (Figure SI I), which is consistent 
with the lack of clusters on the actin-depleted blebs. By contrast, 
the fluorescence anisotropy of the GPIc8:o/c8:o on membrane 
blebs is unaffected by photobleaching (Figures SI FI and S1J), 
confirming its inability to form nanoclusters. 

The synthetic GPI analogs allowed us to directly assess the 
role of (un)saturation in the GPI acyl chains by comparing the 
clustering abilities of GPIci8:o/ci8:o and GPIci8:i/ci8:i- Here, we 
used fluorescein-tagged GPI analogs, and as observed for its 
BODIPY™^-labeled counterpart (Figure 1A), the fluorescence 
anisotropy of GPIci8:o/ci8:o is concentration independent (Fig- 
ure 1C) and rises upon cholesterol depletion and in membrane 
blebs (KS test p < 0.001 in both cases) (Figure IE), which 
is consistent with formation of nanoclusters. In contrast, the fluo- 
rescence anisotropy of GPIci8:i/ci 8:1 is higher than GPIci8:o/ci8:o 
(Figure 1 C) and does not exhibit a significant change (in compar- 
ison to GPIci8:o/ci8:o) upon cholosterol depletion or on mem- 
brane blebs (Figure 1 E), indicating reduced ability to be recruited 
to nanoclusters. These results suggest that GPI anchor clus- 
tering requires long-saturated acyl chain lipids to support actin 
and cholesterol-based nanoclustering. 

GPI-Anchored Protein Nanoclustering Is Abrogated in 
GPI Anchor Remodeling Mutants 

During GPI anchor biosynthesis, cells specifically remodel their 
unsaturated acyl chains present at the sr?-2 position of the imma- 
ture GPI-anchor to generate long saturated acyl chain lipids 
(either 16:0 or 18:0) (McConville and Ferguson, 1993). This pro- 
cess of lipid remodeling is mediated by key enzymes, PGAP2 
and PGAP3 (Maeda et al., 2007). Cell lines carrying mutations 
in both PGAP2 and PGAP3 express cell-surface GPI-APs with 
un-remodeled GPI anchors containing unsaturated acyl chains 
at the sn-2 position of the glycerophospholipid (Maeda et al., 
2007). This enabled us to test the requirement of long saturated 
acyl chains in endogenous GPI-AP nanoclustering. The extent of 
clustering of the GPI-APs in mutant and wild-type cells was 
measured by determining the extent of homo-FRET between 
GPI-APs at the cell surface by monitoring the fluorescence 
anisotropy of fluorescently tagged FLAER (A488F) (Brodsky 
et al., 2000). The fluorescence anisotropy in mutant cells is signif- 
icantly higher (KS test, p < 0.001) than that in wild-type cells 
(Figure 2A) and is similar to the fluorescence anisotropy of 
A488F-labeled GPI-APs measured in cells treated with a choles- 
terol sequestering agent, saponin. 



Incorporation of GPIci6:o/ci6:o in the plasma membrane of 
PGAP2/3 mutant cells showed that GPIci6:o/ci6:o clusters to 
the same extent in both the wild-type and mutant cells (KS 
test, p < 0.001) (Figure 2B), confirming that the lack of nanoclus- 
tering in these cell lines is due to the presence of unsaturated 
lipid tail of GPI-APs and not due to any other artifact that may 
arise as a result of PGAP2/3 perturbation. These experiments 
indicate that cholesterol and actin-dependent nanoclustering 
of endogenous GPI-APs also require long saturated acyl chains 
in their lipid moiety, which is consistent with results obtained with 
the synthetic GPI analogs. 

Inner Leaflet PS Is Required for GPI-AP Nanoclustering 

There are two possibilities by which long-acyl-chain-containing 
GPI-APs can connect to the actin at the inner leaflet; one 
involving a transmembrane linker and the other via lipidic interac- 
tions across the inner leaflet. To distinguish between these 
possibilities, we looked into the roles of phosphatidylinositiol 
4,5-bisphosphate (PI(4,5)P2) and PS, which are obvious candi- 
dates for inner leaflet lipids that could couple outer leaflet GPI- 
APs with actin. Both of these lipids are also known to interact 
with several actin-binding proteins (Yin and Janmey, 2003). PS, 
for example, binds specifically to actin-binding proteins such 
as spectrin, talin, and various others (Makuch et al., 1997; Mu- 
guruma et al., 1995), whereas the Pleckstrin homology (PH) do- 
mains present in many proteins interact with PIPs and actin (Yin 
and Janmey, 2003). 

To test the role of these lipids in actin-driven nanoclustering, 
we expressed protein domains capable of binding PS or Pl(4,5) 
P 2 , the most abundant plasma membrane PIPn (Stauffer et al., 
1998), to putatively mask the interaction of these lipids with the 
cytoplasmically disposed actin filaments. We used a fusion 
construct of GFP with the discoidin-like C2 domain of lactad- 
herin [Lact C2 GFP; Yeung et al., 2008] to mask PS at the inner 
leaflet and the PH domain of PLC6 fused to the NH 2 terminus 
of GFP protein [PH-GFP; (Stauffer et al., 1998)] for masking 
PI(4,5)P2. Cells expressing Lact C2-GFP exhibited higher fluo- 
rescence anisotropy of PLB-FR-GPI, which is consistent with a 
reduction of the extent of nanoclustering (KS test, p < 0.001) 
(Figures 3A and 3B). This was not due to an alteration in the lipid 
profile of cells expressing Lact C2-GFP because their lipid 
composition was unaltered when compared to cells transfected 
with the GFP alone. Individual lipid classes in transfected cells 
varied between 93% and 99% of the control values. By contrast, 
there was no significant effect on the fluorescence anisotropy of 
PLB-labeled FR-GPI when we expressed PH-GFP (Figures 3A 
and 3B) nor when PI(4,5)P2 levels were perturbed using an anti- 
biotic such as neomycin or a Phospholipase C activator such as 
chlorpromazine (Figure S3), indicating the lack of involvement of 
PI(4,5)P2. 

Because GPI-AP nanoclusters depend on actin-based mech- 
anisms, masking PS via the Lact C2 domains could reflect a non- 
specific effect of the inaccessibility of PS to cytosolic factors 
necessary for actin polymerization. To rule out these effects 
and show that direct association of PS with actin is sufficient 
for GPI-AP nanoclustering at the outer leaflet, we expressed 
the Lact C2 domain fused to the actin-filament binding domain 
of Ezrin (Lact C2-Ez-YFP; Figures 3C and 3D). Similar to the 
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Figure 2. GPI-AP Nanoclustering Is Reduced in GPI Anchor Lipid Remodeling Mutants 

(A and B) Fluorescence anisotropy of fluorescently tagged FLAER™ (Alexa-488-Fl_AER, A488F) in wild-type and PGAP2/3 double-mutant CHO cells (blue di- 
amonds and red squares, respectively) plotted against fluorescence intensity shows an increase in anisotropy in mutant cells (A), corresponding to a loss of 
homo-FRET between A488F-labeled GPI-APs. Intensity and anisotropy were determined from images collected from cells as shown on the right. CFD plots and 
images (A) for wild-type (red line), PGAP2/3 double-mutant (green line) and saponin treated (violet line) cells and (B) for GPIci6:o/ci6:o in control (red line), and 
cholesterol-depleted (black line) conditions in WT and PGAP2/3 mutant cells. CFD plots show that A488F-labeled GPI-APs in mutant cells exhibit an increase in 
anisotropy compared to wild-type cells and exogenously incorporated GPIci6:o/ci6:o exhibit significantly depolarized fluorescence anisotropy (control) in both 
wild-type (top) and mutant cells (bottom) that is sensitive to cholesterol depletion by saponin (black line). Each data point in the graphs and CFDs represents 
average anisotropy values derived from nearly 40 cells from 3 independent experiments. Error bars represent SD. 



Lact C2 construct, this protein is also recruited to the plasma 
membrane, in contrast to a cytosolic EGFP control that does 
not have plasma membrane binding capacity (Figures 3C and 
3D). More importantly, only the fusion construct that connects 
PS to actin restores nanoclustering of GPI-APs (Figures 3C 
and 3D), emphasizing the role of actin and its ability to link up 
to PS in facilitating nanocluster formation. 

GPI-AP Nanoclustering Requires Long-Acyl-Chain- 
Containing PS 

T o explore the nature of the acyl chain on the PS that is involved in 
coupling with GPI-APs at the outer leaflet, we measured GPI-AP 



nanoclustering in PS-synthesis-deficient Chinese hamster ovary 
(CHO) cell lines (PSA3 cells). These cell lines carry a mutation in 
the PSS1 gene (Nishijima et al., 1 986) where the cells are rendered 
completely dependent on phosphatidylethanolamine (PE) (Fig- 
ure S4A; Kennedy and Weiss, 1956; Percy et al., 1983). PS levels 
at the plasma membrane of PSA3 cells grown in absence of etha- 
nolamine (deplete) are drastically reduced compared to cells 
grown in its presence (replete) (Figures S4B and S4C). To measure 
the extent of endogenous GPI-AP nanoclustering, we compared 
the fluorescence anisotropy of the GPI-binding toxin A488F at 
the surface of PS-depleted and -repleted cells. Nanoclustering 
of endogenous GPI-AP was disrupted in PS-depleted cells; it is 
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Figure 3. Masking of PS Binding Sites Alters 
GPI-AP Nanocluster Organization 

(A-D) Cropped fluorescence intensity and anisot- 
ropy images of FR-GPI-expressing CHO cells (A 
and D) transfected with EGFP, EGFP tagged to C2 
domain of lactadherin (Lact C2 GFP), PH domain 
of PLC5 (PH GFP), or a fusion construct of Lact C2 
and actin binding domain (Lact C2-Ez-YFP) and 
corresponding CFD plots (B and C) were obtained 
from wide-field (A and B) and TIRF (C and D) 
microscopes, respectively. The fluorescence 
anisotropy of PLB bound to FR-GPI in cells ex- 
pressing EGFP (red line) is comparable to that 
obtained in PH-GFP (violet line, middle) or Lact C2- 
Ez-YFP (black line, right) but is increased in cells 
expressing Lact C2-GFP (green line, left or right). 
This is in turn comparable to cells treated with 
saponin or mpCD (blue line). See also Figure S3. 
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comparable to that obtained in cells depleted of cholesterol (KS 
test, p < 0.001 ; Figures 4A and 4B). To confirm that the defect in 
these cells was not due to any perturbation of the endogenous 
GPI anchor in the PS-deplete condition, we established that, 
whereas the PS-replete cells were capable of supporting nano- 
clustering of exogenously added GPIci6:o/ci6:o> PS-deplete cells 
failed to do so (Figures S4D and S4E). 

We next replenished the pool of PS in PS-deplete cells by add- 
ing various PS species differing in acyl chain length and satura- 
tion, and we have confirmed their incorporation at the inner 



leaflet by the ability to stain with Annexin 
V only after ionomycin treatment (Figures 
S4B and S4C). Our results show that 
only the long alkyl-chain-containing 
analog is capable of restoring nano- 
clustering of GPI-APs (Figures 4A, 4B, 
and 4E: PSci 2 : 0 /C 12 : 0 *, PSci 8 : 0 /C 18 : 0 *, 

PSci8:i/ci8:i*)> despite the incorporation 
of all analogs in the membrane at similar 
levels (Figure S5C). Here, we used syn- 
thetic acyl/alkyl PS analogs that are resis- 
tant to phospholipase A 2 (PLA 2 ) cleavage 
(Burke and Dennis, 2009) (Figures 4A, 4B, 
and 4E; PSci 2 : 0 /C 12 : 0 *» PGci 8 : 0 /C 18 : 0 *» SRCl 
PSci8:i/ci8:i*)- This experimental strat- 
egy was adopted because exogenous 
addition of any di-acyl PS species re- 
stored nanoclustering of endogenous 
GPI-APs (Figure S4F) in the absence 
of a PI-A 2 inhibitor (methyl-arachidonyl- 
fluorophosphonate; Figures S4F and 
S4G). By contrast, in the presence of the 
PI_A 2 inhibitor, only the long saturated 
PSci 8 : 0 / 1 8:0 restored GPI-AP nanocluster- 
ing, whereas the short and unsaturated 
lipids, PSci 2 : 0 /C 12:0 ^RCl PSci 8 : 1 /C 18:1 > 
were incapable of restoring GPI-AP 
nanoclustering (Figures S4F and S4G). 
This suggested that PI_A 2 -like enzymes 
engage in remodeling the acyl chains of exogenously incorpo- 
rated PS at the inner leaflet. We also found that the exogenous 
addition of long-acyl-containing PE or PC to the same levels as 
the PS species (Figure S5B) in PS-deplete conditions does not 
rescue the nanoclustering of GPI-AP (Figure S4H). 

In CHO cells, the most abundant PS species is the asymmetric 
PSci8:o/ci8:i (Figure S4I), and hence, we determined nanocluster 
recovery by adding exogenous asymmetric PSci8:o/ci8:i in the 
presence of PI_A 2 inhibitor. The restoration of GPI-AP nanoclus- 
tering by the addition of the asymmetric lipid was quantitatively 
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equivalent to that of the fully saturated long-chain PSci8:o/ci8:o 
(Figures 4C and 4D). These results strongly suggest that GPI- 
APs at the outer leaflet couple across the bilayer with PS with 
the aid of at least one long saturated chain and adequate 
cholesterol. 

Atomistic MD Simuiations Provide a General Mechanism 
for Transbilayer Coupling 

To understand the mechanism by which long-chain GPI and PS 
lipids couple across the fluid bilayer, we developed atomistic 



Figure 4. Nanoclustering of GPI-AP at the 
Cell Surface Requires Long-Acyl-Chain- 
Containing PS 

(A-D) Cropped fluorescence intensity and anisot- 
ropy images (A and C) and cumulative frequency 
distributions (B and D) of A488F-labeled PSAS 
mutant CHO cells grown with (PS replete [PS'^; 
red line] or without [PS deplete (PS“); green line]) 
ethanolamine and supplemented with PLA 2 - 
insensitive PS analogs of indicated acyl chain 
lengths and saturation (A and B; blue or black line) 
or supplemented with PLA 2 inhibitor and PLA 2 - 
sensitive PS analogs of indicated acyl chain satu- 
ration (C and D; blue and black lines). 

(E) Chemical structure depicts the PLA2-insensi- 
tive (black) and -sensitive (gray) PS analogs used 
in (A)-(D) above, respectively. 

See also Figures S4 and S5. 



molecular dynamic simulations of mem- 
brane bilayers comprising a distinct up- 
per (palmitoyloleoyl-phosphatidylcholine 
[POPC], palmitoyl-sphingomyelin [PSM], 
and cholesterol [Choi]) and lower leaflet 
(POPC and Choi) capable of phase segre- 
gation into lo and Id phases (Polley et al., 
2012, 2014). This approach has allowed 
an exploration of the effect of lo-ld segre- 
gation on either of the two leaflets (Polley 
et al., 201 2, 201 4). Here, we examined the 
regime where both leaflets of the asym- 
metric bilayer membrane are macroscop- 
ically in the homogenous mixed Id phase. 
We ask under what conditions would 
trace amounts of GPI on the putative 
outer (upper) leaflet register with PS in 
the putative inner (lower) leaflet and 
what the nature is of this transbilayer 
coupling. All simulation details, including 
force fields, tests of approach to thermo- 
dynamic equilibrium, and stress profiles 
at equilibrium are presented in the 
Extended Experimental Procedures 
(Figure SOB). 

We find that, regardless of chain length/ 
saturation and relative composition, as 
long as both leaflets of the bilayer are in 
the Id phase (characterized by low values 
of the deuterium order parameter S, a measure of the extent of 
chain ordering), the distribution of GPI and PS is uniform with 
no transbilayer registry (Figures 5A, S6A, and S6C). However, 
the situation is entirely different if we cluster and immobilize either 
PS or GPI (Figures 5A, 5B, and S6A). In this case, we obtain 
co-segregation and perfect bilayer registry, a situation that 
represents a constrained equilibrium because of immobilization 
(Supplemental Information; Figure S6C). Note that the high trans- 
bilayer coupling in the Id phase is only obtained at adequate 
levels of cholesterol in the two leaflets (as shown in Figure 5C). 
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Figure 5. Atomistic MD Simulations Capture Transbilayer Interdigitation of Long Acyl Chain Lipids 

(A) Equilibrium configurations of an asymmetric bilayer composed of POPC (gray), PSM (magenta), Choi (yellow), GPI (red or green), and PS (blue) embedded in 
water (cyan) from MD simulations in the /c/ phase. Upper leaflet comprises 4% PSM, 4% Choi, 10 long saturated GPIci6:o/ci6:o. and the rest POPC, whereas lower 
leaflet has 35% Choi, 25 long saturated PS ci8:o/ci8:o. and the rest POPC when PS is not constrained (right. Figure S6A) or when PS molecules are immobilized 
(left and middle) which, when zoomed (middle), shows bilayer registry and interdigitation of GPI and PS. Local region surrounding interdigitating PS and GPI 
consists of enhanced levels of PSM and Choi, resembling local /o-like nanodomain. 

(B) Extent of bilayer registry between GPI and PS measured by the transbilayer correlation coefficient Cui (Experimental Procedures), which takes values 1 (0) 
when bilayer registry is strong (weak). Data shown are for the same bilayer composition as in (A) above. Transbilayer coupling Cui is significant only when either PS 
or GPI are held in a cluster and immobilized. There is no transbilayer coupling and hence no registry when the GPI and PS are unconstrained. 

(C) Levels of cholesterol in upper/lower leaflets that are needed to obtain high transbilayer coupling Cui in the Id phase (white [high Cui] or gray [low Cui]). The 
cholesterol concentrations in the two leaflets are varied as shown (dots), the upper leaflet has 10 GPI, and the lower leaflet has 15 PS, with POPC contributing to 
the rest. PSM concentration at the upper leaflet is same as cholesterol. Red (blue) dots designate high Cui =1 (low, Cui = 0). 

(D) Coherence length of membrane component density and deuterium order parameter in the two leaflets at late times when PS is held immobilized, computed 
from the exponential decay of their profiles. Coherence length increases non-linearly with the number of immobilized PS. Composition in upper leaflet is 92% 
POPC, 4% PSM, 4% Choi, and 10 GPI and in lower leaflet is 65% POPC, 35% Choi with number of PS varying from 2 to 20. 

(E) Transbilayer coupling, Cui between GPI and PS is sensitive to lipid chain length and degree of saturation of acyl chains. Composition of upper leaflet is 33.3% 
PSM, 33.3% Choi, 10 GPI, and the rest POPC; the composition of the lower leaflet is 10% PS, 10% Choi, and the rest POPC. Chain length and degree of 
saturation of PS and GPI are varied as indicated. Strong transbilayer coupling and bilayer registry are obtained only when both PS and GPI have long saturated 
acyl chains. 

Error bars represent SD. See also Figure S6. 
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This co-segregation is accompanied by a steady increase in the 
local chain stiffness of the membrane components as determined 
by the local deuterium order parameter S, reflecting local lo 
ordering within the co-segregation region (Figure S6C). A similar 
dynamics toward co-segregation occurs when the PS starts with 
a uniform distribution and the GPI is kept immobilized (data not 
shown). We also study the stability of co-clustering when both 
GPI and PS are initially clustered at the center with PS held immo- 
bile (Figure S6D). From the profile of GPI density and order 
parameter, we can extract a length scale over which the lipids 
maintain /o-like order (coherence length) as a function of the 
number of immobilized PS molecules (Figure 5E). The non-linear 
increase in the coherence length with the number of immobilized 
PS also reflects the formation of a local lo nanodomain. 

This transbilayer coupling is surprisingly sensitive to reducing 
the acyl chain length (or lowering the degree of saturation) of 
either GPI or PS (Figure S6E), as revealed by the transbilayer cor- 
relation coefficient (Figure 5F). This is manifest in the deuterium 
order parameter profile, which remains small (and close to the 
Id value), thus failing to achieve transbilayer coupling that is 
needed for bilayer registry (Figure S6E). Thus, our atomistic sim- 
ulations done in the Id phase of the asymmetric bilayer indicate 
that it is only in the presence of adequate amounts of cholesterol 
that the transbilayer coupling of long saturated GPI and PS can 
be achieved, provided that PS is held immobilized. The lifetime 
of this co-clustering is therefore set by the lifetime of immobiliza- 
tion of PS. These simulations imply that, in a multicomponent 
bilayer that is in the mixed Id phase (not far from the lo/ld transi- 
tion), clustering and immobilization of a few long acyl chain lipids 
should suffice to effect transbilayer coupling by stabilizing small 
/o-like regions that could spontaneously arise due to proximity to 
an /o//c/ transition. 

Long Saturated Fatty Acyl Chains of Phospholipids Are 
Sufficient for Their Nanoclustering 

To directly test the predictions of the atomistic MD simulations, 
we incorporated long acyl-chain-containing synthetic PE analog 
conjugated to Fluorescein (F-DHPE) in CHO cell membranes and 
determined its nanoscale organization. Similar to endogenous 
GPI-APs, fluorescence emission anisotropy of F-DHPE is also 
concentration independent and increases upon cholesterol 
depletion and photobleaching (data not shown, but see Fig- 
ure 6E). Furthermore, incorporation of F-DHPE into PS-deplete 
cells did not result in nanocluster formation, whereas in PS- 
replete cells, they form cholesterol-sensitive nanoclusters (Fig- 
ures 6C and 6D). Consistent with the absence of nanoclusters 
in PS-deplete cells, the fluorescence anisotropy of F-DHPE 
also does not increase upon photobleaching. By contrast, there 
is a rise in anisotropy in PS-replete cells, which is consistent with 
the presence of PS-dependent nanoclusters of F-DHPE (Fig- 
ure 6E). This verifies the sufficiency of long saturated acyl chains 
in facilitating cholesterol-sensitive and PS-mediated nanoclus- 
ters of lipids in membranes of live cells. 

Immobilization Promotes Transbiiayer Coupiing 
Mechanism 

To test the role of immobilization in effecting transbiiayer 
coupling of specific lipid components in either leaflet as pre- 



dicted from our simulations (Figure 5), we determine whether 
crosslinking outer leaflet GPI-APs into optically resolvable clus- 
ters could result in the recruitment of inner leaflet PS molecules 
to these sites independent of the involvement of the actin ma- 
chinery. Alternatively, if we are able to immobilize PS at the inner 
leaflet, long-acyl-chain-containing species at the outer leaflet 
should be co-localized to these regions. First, we crosslinked 
the folate receptor (FR-GPI) at the outer leaflet (Mayor and Max- 
field, 1995) and examined the co-distribution of the inner leaflet 
lipid probes in plasma membrane blebs. In cells expressing 
Lact C2-GFP, which probes inner-leaflet PS, there is a strong 
correlation between the intensity distribution of Lact C2-GFP 
and PLB-FR-GPI, whereas a significantly reduced correlation 
was observed with PH-GFP, the PIP 2 probe (Figures 7A and 
7B). Moreover, this correlation reduces upon cholesterol deple- 
tion for Lact C2-GFP (KS test, p < 0.001) but remains low and 
unaltered for PH-GFP (Figures 7A and 7B). As a control, we 
determined whether crosslinking the FR domain linked to trans- 
membrane domain (FR-TM-Ez; Gowrishankar et al., 2012) could 
recruit Lact C2-GFP. Our results show that there is no significant 
correlation between crosslinked FR-TM-Ez and Lact C2-GFP 
(Figures 7A and 7B). These results indicate that PS at the inner 
leaflet couples strongly with cross-linked GPI-AP patches at 
the outer leaflet in a cholesterol-sensitive manner. Second, 
when the actin filament binding Lact C2-Ez-YFP is expressed 
in CHO cells, it is recruited to the plasma membrane (Figures 
3D and 70), where it is found concentrated on relatively stable 
actin stress fibers visible at the membrane surface in a TIRF field 
(Figures 3D and 70). This provides an experimental handle to 
visualize actin-immobilized inner leaflet PS. Correspondingly, 
the fluorescence intensity distribution of an outer leaflet GPI- 
AP and exogenously added DHPE (B-DHPE; Figure 70) are 
correlated with regions that show Lact C2-Ez-YFP enrichment. 
No enrichment of an exogenously added short chain synthetic 
lipid analog (GPIc8:o/c8:o) is observed in the region of Lact 02- 
Ez-YFP enrichment (Figure 70), confirming that this transbiiayer 
coupling requires long acyl chains. The concentrating effect 
of PS on the outer leaflet lipid is only observed in the presence 
of an actin-PS connector because this was absent when the 
F-actin binding domain of Ezrin or the PS-binding domains are 
expressed on their own (Figure S7), which is consistent with 
the role of actin and its PS-binding partners in aiding the forma- 
tion of nanoclusters. Taken together, the experiments point to- 
ward a general mechanism underlying transbiiayer coupling 
where either of the outer or inner leaflet molecules need to be 
immobilized. 

DISCUSSION 

Our experimental and simulation results provide evidence that 
nanoclustering of outer leaflet GPI-APs and indeed any outer 
leaflet lipid by dynamic cortical actin is effected by the interdig- 
itation and transbiiayer coupling of lipids having long, saturated 
acyl chains, both in the outer and inner leaflets of the PM. This is 
contingent on properties of the lipid acyl chains, adequate 
cholesterol levels in the bilayer, and immobilization of the inner 
leaflet lipid. In contrast to transmembrane-anchored actin bind- 
ing proteins, which straddle the bilayer, these three features 
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Figure 6. Lipids with Long Saturated Acyl 
Chains Are Sufficient to Drive Nanocluster 
Formation 

(A-D) Cumulative frequency distribution (A and C) 
and fluorescence intensity and anisotropy images 
(B and D) of F-DHPE incorporated in control 
(IA2.2F) cells (A and B) and PSS1 -deficient (PSAS) 
CHO cells (C and D) show that the fluorescence 
anisotropy of F-DHPE in control cells and PS 
replete (control, PS+; red line) is depolarized 
compared to that measured in cholesterol- 
depleted (black line) or PS-deficient (PS-) cells 
(blue line). Note that fluorescence anisotropy of F- 
DHPE in PS deplete (PS“) cells (blue line) is similar 
to that measured in saponin-treated cells (black 
line). Each data point in the graphs represents 
average anisotropy with SD for the corresponding 
intensity bin obtained from a 10x10 pixel region 
(20-50 regions per cell) from at least 40 cells 
derived from two independent experiments. 

(E) Photobleaching profiles of F-DHPE incorpo- 
rated into PS replete (PS^) cells and PS deplete 
(PS“) cells. PS replete (PS^) cells and PS deplete 
(PS“) cells were incorporated with exogenously 
added F-DHPE (E) photobleached and the fluo- 
rescence emission anisotropy recorded during the 
photobleaching process. Note that the profiles of 
change in fluorescence anisotropy upon change in 
fluorescence intensity in case of PS replete (PS"^) 
cells are characteristic of nanoclustered fluo- 
rophores (Sharma et al., 2004), whereas PS 
deplete (PS“) cells exhibit no change, indicating 
the lack of homo-FRET. The starting intensity for all 
the samples collected here is similar and normal- 
ized to that used in the first frame. 

Error bars represent SD. See also Figure S4. 
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allow for a flexibility and regulation of transbilayer communica- 
tion of lipids. 

The requirement for long acyl or alkyl chains to couple across 
the bilayer provides a purely lipidic coupling mechanism, obvi- 
ating the need for any transmembrane protein coupling mecha- 
nism. This could also serve as a way to couple many outer leaflet 
membrane lipids such as Gangliosides and other sphingolipids 
(Wolf et al., 1 998). The results from simulations show that choles- 
terol can stabilize local /o-like order over a length scale that is 
larger than the size of the immobilized cluster, suggesting that 
it might also engage in recruiting more components via a positive 
feedback mechanism leading to a composition gradient of com- 
ponents that favor lo domains. 

The immobilization of the inner leaflet lipid relates to the mech- 
anism needed for nanoclustering. In the atomistic MD simula- 
tions, carried out for a timescale of 200 ns (and reconfirmed by 
longer 1 [is runs; Supplemental Information), transbilayer 
coupling requires immobilization of PS lipid; the removal of 



anchoring leads to a rapid loss of clus- 
tering of the lipids at both leaflets. We 
had previously shown that GPI-AP nano- 
clusters are “immobile” for a period of 
0.1-1 s (Goswami et al., 2008). This could 
reflect the time of engagement of dy- 
namic cortical actin filaments at the inner surface of the cell 
membrane (Gowrishankar et al., 2012). Additionally, immobiliza- 
tion of more than one lipid molecule is necessary to create the 
transbilayer connection; more molecules need to be immobi- 
lized, depending on how far the membrane composition is main- 
tained away from the equilibrium lo-ld phase transition in order to 
couple across the bilayer. 

Synthetic lipids with long acyl chains couple across the bilayer, 
forming dynamic actin-based nanoclusters; this mechanism is 
therefore capable of clustering any endogenous outer leaflet lipid 
species with long acyl chains if the inner leaflet lipid is sufficiently 
immobilized. Indeed, coupling of PS to a stable actin template 
such as a stress fiber via a synthetic PS-actin bridge (Lact C2- 
Ez-YFP) also served to recruit endogenous GPI-APs and exoge- 
nously added long-chain lipids (F-DHPE), but not short-chain 
lipids. Given that the nanoclusters formed by the contractile 
actin-based clustering machinery exhibit nanoscale clusters 
that appear to be co-segregated (Goswami et al., 2008; van 
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Zanten et al., 2009), this observation suggests that domains that 
are enriched in nanoclusters created by the transbilayer coupling 
mechanism will have /o-like character. Consistent with this, 
recent evidence from our group indicates that regions enriched 
in GPI-APs nanoclusters exhibit “/o”-like properties (Suvrajit 
Saha, A.A.A., and S.M., unpublished data). Together, these prin- 
ciples provide a very general mechanism whereby immobilizing 
an appropriate inner or outer leaflet lipid with long saturated 
acyl chains can help stabilize local lo domains even in a predom- 
inantly homogenous, mixed Id membrane. 

We show that the co-segregation of GPI and PS is achieved 
when GPI is clustered and immobilized in the outer leaflet while 
allowing PS to equilibrate and vice versa. This has implications 
for the construction of cell-surface signaling platforms or sorting 
platforms at the inner leaflet by crosslinking long saturated GPI- 
anchored proteins (Stefanova et al., 1991; Suzuki et al., 2007; 
Wolf et al., 1998). Here, the clustering of GPI-APs at the outer 
leaflet appears to build complexes at the inner leaflet to effect 
specific signaling reactions. Additionally, local lipid organization 
plays a crucial role in the nanoclustering of cell-surface Ras mol- 
ecules, thereby regulating signaling mechanisms locally (Ariotti 
et al., 2014). Inefficient coupling across the membrane can 
impair several cell-signaling events and can lead to major im- 
mune response and neurodegenerative disorders. For instance, 
deletion of PGAP3 results in enhanced T cell receptor signaling, 
as evaluated in PGAP3 knockout mice (Murakami et al., 2012), 
and a mutation in PGAP3 leads to a subtype of hyperphosphata- 
sia with intellectual disorders commonly referred to as Mabry 
syndrome (Howard et al., 2014). 

Finally, PS must in turn be connected to endogenous actin- 
binding proteins. The capacity of the synthetic PS and actin 
binding fusion protein (LactC2-Ez-YFP) to reconstruct actin- 
based nanoclustering provides strong support for this idea. 
Several examples of such proteins exist such as talin (Muguruma 
et al., 1995), spectrin (An et al., 2004), caldesmon (Makuch et al., 
1997), myosin 1A (Mazerik and Tyska, 2012), and vinculin (Ito 
et al., 1983), and to completely elucidate the mechanism of 
transbilayer coupling to actin, the identity of this coupling 
agent(s) needs to be determined. 

In conclusion, we show that lipidic interactions mediated by 
long-chain interdigitation in the presence of cholesterol stabilize 
a transbilayer connection between outer and inner leaflet lipids 
when either of the lipid species is immobilized. The deployment 
of this mechanism in the mixed phase of the bilayer by active 
clusters of actin filaments, which can engage PS at the inner 
leaflet, provide a general mechanism to stabilize these lo do- 
mains locally. The formation of the contractile actin clusters 
would then determine when and where these domains may be 
stabilized, bringing the generation of membrane domains in 
live cells under control of the acto-myosin signaling network. 

EXPERIMENTAL PROCEDURES 

Detailed experimental conditions are provided in the Extended Experimental 
Procedures in the Supplemental Information. 

Plasmids, Cell Lines, Antibodies, and Other Reagents 

CHO cell lines stably expressing folate receptor (IA2.2; Mayor and Maxfield, 
1995) or carrying mutations in PGAP2 and 3 (PGAP2/3; Maeda et al., 2007), 



PSA3 (PSA3; Nishijima et al., 1986), C term Ez GFP, and PH-GFP were ob- 
tained from several sources as indicated in the Supplemental Information. 
IA2.2 cells, PSA3 cells, PGAP2/3 double-mutant cells, and FR-TM-Ez cells 
were maintained in Ham’s FI 2 medium with an appropriate concentration of 
antibiotics as mentioned in the Supplemental Information. 

GPI and PS Analogs 

The synthesis of fluorescently tagged GPI analogs and PLA 2 -resistant PS an- 
alogs is described in detail in the Extended Experimental Procedures. 

GPI Analog and Lipid Incorporation 

Synthetic GPI analogs were incorporated into cell membranes by the y-CD 
method (Koivusalo et al., 2007), whereas PS analogs were incorporated by Lip- 
ofectamine method as described (Saha et al., 2015). 

Diffusion Measurements 

Fluorescence correlation spectroscopy (FCS) and fluorescence recovery after 
photobleaching (FRAP) measurements to determine proper incorporation 
were performed as described previously (Gowrishankar et al., 2012). 

PI-PLC Treatment 

PI-PLC was purified in the laboratory from Bacillus thuringiensis as reported 
(Kobayashi et al., 1996). Cells cooled on ice were incubated with PI-PLC 
(0.5 U/ml) for 1 hr and then washed with Ml and imaged live. 

Anisotropy Measurements 

Steady-state homo-FRET-based anisotropy measurements were carried 
out on a NikonTE2000 epifluorescence microscope equipped with an Andor 
TuCam dual camera imaging arrangement in the TIRF, spinning-disc confocal, 
or wide-field mode (Ghosh et al., 2012). 

Treatments to Perturb Inner Leaflet Lipids, PIP2 and PS 

To perturb PIP 2 , CHO cells were treated with either neomycin (1 0 i^M) or chlor- 
promazine (10 ^iM) for 15 min at 37°C as described (Arbuzova et al., 2000; 
Raucher and Sheetz, 2001). For perturbation of PS levels, PSA3 cells were 
grown under replete (cells grown in the presence of 10 laM ethanolamine) or 
deplete (cells grown in dialysed serum for 48 hr) conditions. PS levels were 
measured by assessing the extent of Annexin V binding as detailed in the Sup- 
plemental Information. 

Lipid Anaiysis and Mass Spectrometry Experiments 

Lipids analysis was carried out on FACS sorted cells expressing specific trans- 
genes (Lact C2-GFP or GFP) or on membrane blebs prepared from cells as 
detailed previously (Pick et al., 2005). Lipid extraction was done by Bligh 
and Dyer method (Bligh and Dyer, 1959), and mass spectrometry measure- 
ments were carried out on an LTQ Orbitrap XL hybrid mass spectrometer 
(Thermo Fisher Scientific). 

Atomistic MD Simuiations 

We perform an all-atom MD simulation of the asymmetric, multi-component 
bilayer with POPC, cholesterol (Choi), and PSM in the upper leaflet and 
POPC and cholesterol in the lower leaflet at 23°C. GPI-AP and PS are inserted 
in the upper and lower leaflets, respectively. The relative composition of the 
bilayer is varied, and the detailed experimental procedures and simulation 
conditions are provided in the Supplemental Information. The membrane bila- 
yers were equilibrated and deemed to be mechanically stable prior to deter- 
mining the distribution of various constituents. Immobilization of a molecular 
species is achieved by setting a high value to its mass without affecting other 
features of the simulation (Supplemental Information). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures and 
seven figures and can be found with this article online at http://dx.doi.org/ 
10.1016/j.cell.2015.03.048. 
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Figure 7. Crosslinking of Either FR-GPI at the Outer Leaflet or PS at the Inner Leaflet Demonstrates a Strong Transbilayer Coupling 

(A) Cropped images of membrane blebs obtained after jasplakinolide treatment cells expressing Lact C2 GFP or PH GFP (left, green in merge) either with (treated) 
or without m(3CD (control), followed by cross-linking FR-GPI or FR-TM-Ez (middle, as indicated; red in merge) with primary and secondary antibodies to create 
micron-sized patches of these proteins. 

(B) Graph shows the extent of correlation between the intensity fluctuation of crosslinked FR-GPI or FR-TM-Ez and PH GFP or Lact C2 GFP both in the presence 
(+) and absence (-) of m(3CD, determined from images of blebs (pooled from three independent experiments) as shown in (A). 

(C) Images and normalized line intensity profiles of Lact C2-Ez-YFP transfected in IA2.2 cells labeled with PLB or DHPE or C8 GPI analog as indicated. This shows 
a strong colocalization in the distribution of Lact C2-Ez-YFP with FR-GPI and DHPE, but not with C8 GPI analog. Red line in (C) depicts the region of line scan 
measurement. Scale bar, 10 i^m. 

The whiskers represent the outliers. See also Figure S7. 



Cell 161, 581-594, April 23, 2015 ©2015 Elsevier Inc. 593 




Cell 



Kumari, S., and Mayor, S. (2008). ARF1 is directly involved in dynamin-inde- 
pendent endocytosis. Nat. Cell Biol. 10 , 30-41. 

Maeda, Y., Tashima, Y., Houjou, T., Fujita, M., Yoko-o, T., Jigami, Y., Taguchi, 
R., and Kinoshita, T. (2007). Fatty acid remodeling of GPI-anchored proteins is 
required for their raft association. Mol. Biol. Cell 18 , 1497-1506. 

Makuch, R., Zasada, A., Mabuchi, K., Krauze, K., Wang, C.L.A.L., and Dab- 
rowska, R. (1997). Phosphatidylserine liposomes can be tethered by caldes- 
mon to actin filaments. Biophys. J. 73 , 1607-1616. 

Marchetti, M.C., Joanny, J.F., Ramaswamy, S., Liverpool, T.B., Prost, J., Rao, 
M., and Simha, R.A. (2013). Hydrodynamics of soft active matter. Rev. Mod. 
Phys. 85 , 1143-1189. 

Mayor, S., and Maxfield, F.R. (1995). Insolubility and redistribution of GPI- 
anchored proteins at the cell surface after detergent treatment. Mol. Biol. 
Cell 6 , 929-944. 

Mayor, S., and Pagano, R.E. (2007). Pathways of clathrin-independent endo- 
cytosis. Nat. Rev. Mol. Cell Biol. 8 , 603-612. 

Mayor, S., and Riezman, H. (2004). Sorting GPI-anchored proteins. Nat. Rev. 
Mol. Cell Biol. 5, 110-120. 

Mazerik, J.N., and Tyska, M.J. (2012). Myosin-1 A targets to microvilli using 
multiple membrane binding motifs in the tail homology 1 (TH1) domain. 
J. Biol. Chem. 287 , 13104-13115. 

McConville, M.J., and Ferguson, M.A. (1993). The structure, biosynthesis and 
function of glycosylated phosphatidylinositols in the parasitic protozoa and 
higher eukaryotes. Biochem. J. 294 , 305-324. 

Muguruma, M., Nishimuta, S., Tomisaka, Y., Ito, T., and Matsumura, S. (1995). 
Organization of the functional domains in membrane cytoskeletal protein talin. 
J. Biochem. 117 , 1036-1042. 

Murakami, H., Wang, Y., Hasuwa, H., Maeda, Y., Kinoshita, T., and Murakami, 
Y. (2012). Enhanced response of T lymphocytes from Pgap3 knockout mouse: 
Insight into roles of fatty acid remodeling of GPI anchored proteins. Biochem. 
Biophys. Res. Commun. 417 , 1235-1241. 

Nishijima, M., Kuge, O., and Akamatsu, Y. (1986). Phosphatidylserine biosyn- 
thesis in cultured Chinese hamster ovary cells. I. Inhibition of de novo phos- 
phatidylserine biosynthesis by exogenous phosphatidylserine and its efficient 
incorporation. J. Biol. Chem. 261 , 5784-5789. 

Percy, A.K., Moore, J.F., Carson, M.A., and Waechter, C.J. (1983). Character- 
ization of brain phosphatidylserine decarboxylase: localization in the mito- 
chondrial inner membrane. Arch. Biochem. Biophys. 223 , 484-494. 

Pick, H., Schmid, E.L, Tairi, A., Ilegems, E., Hovius, R., and Vogel, H. (2005). 
Investigating cellular signaling reactions in single attoliter vesicles. J. Am. 
Chem. Soc. 8 , 2908-2912. 

Polley, A., Vemparala, S., and Rao, M. (2012). Atomistic simulations of a multi- 
component asymmetric lipid bilayer. J. Phys. Chem. B 116 , 13403-13410. 
Polley, A., Mayor, S., and Rao, M. (2014). Bilayer registry in a multicomponent 
asymmetric membrane: dependence on lipid composition and chain length. 
J. Chem. Phys. 141 , 064903. 

Prior, I.A., Muncke, C., Parton, R.G., and Hancock, J.F. (2003). Direct visuali- 
zation of Ras proteins in spatially distinct cell surface microdomains. J. Cell 
Biol. 160 , 165-170. 



Raucher, D., and Sheetz, M.P. (2001). Phospholipase C activation by anes- 
thetics decreases membrane-cytoskeleton adhesion. J. Cell Sol. 114 , 3759- 
3766. 

Saha, S., Raghupathy, R., and Mayor, S. (2015). Homo-FRET imaging high- 
lights the nanoscale organization of cell surface molecules. Methods Mol. 
Biol. 1251 , 151-173. 

Sengupta, P., Jovanovic-Talisman, T., Skoko, D., Renz, M., Veatch, S.L., and 
Lippincott-Schwartz, J. (2011). Probing protein heterogeneity in the plasma 
membrane using PALM and pair correlation analysis. Nat. Methods 8 , 
969-975. 

Sharma, P., Varma, R., Sarasij, R.C., Ira, Gousset, K., Krishnamoorthy, G., 
Rao, M., and Mayor, S. (2004). Nanoscale organization of multiple GPI- 
anchored proteins in living cell membranes. Cell 116 , 577-589. 

Simons, K., and Ikonen, E. (1997). Functional rafts in cell membranes. Nature 
387 , 569-572. 

Simons, K., and Vaz, W.L.C. (2004). Model systems, lipid rafts, and cell mem- 
branes. Annu. Rev. Biophys. Biomol. Struct. 33 , 269-295. 

Singer, S.J., and Nicolson, G.L. (1972). The fluid mosaic model of the structure 
of cell membranes. Science 175 , 720-731. 

Stauffer, T.P., Ahn, S., and Meyer, T. (1998). Receptor-induced transient 
reduction in plasma membrane Ptdlns(4,5)P2 concentration monitored in living 
cells. Curr. Biol. 8 , 343-346. 

Stefanova, I., Horejsi, V., Ansotegui, I.J., Knapp, W., and Stockinger, H. (1991). 
GPI-anchored cell-surface molecules complexed to protein tyrosine kinases. 
Science 254 , 1016-1019. 

Suzuki, K.G.N., Fujiwara, T.K., Sanematsu, F., lino, R., Edidin, M., and Kusumi, 
A. (2007). GPI-anchored receptor clusters transiently recruit Lyn and G alpha 
for temporary cluster immobilization and Lyn activation: single-molecule 
tracking study 1. J. Cell Biol. 177 , 717-730. 

Swamy, M.J., Ciani, L., Ge, M., Smith, A.K., Holowka, D., Baird, B., and Freed, 
J.H. (2006). Coexisting domains in the plasma membranes of live cells charac- 
terized by spin-label ESR spectroscopy. Biophys. J. 90 , 4452-4465. 
van Zanten, T.S., Cambi, A., Koopman, M., Joosten, B., Figdor, C.G., and Gar- 
cia-Parajo, M.F. (2009). Hotspots of GPI-anchored proteins and integrin nano- 
clusters function as nucleation sites for cell adhesion. Proc. Natl. Acad. Sol. 
USA 106 , 18557-18562. 

Varma, R., and Mayor, S. (1998). GPI-anchored proteins are organized in sub- 
micron domains at the cell surface. Nature 394 , 798-801 . 

Wolf, A.A., Jobling, M.G., Wimer-Mackin, S., Ferguson-Maltzman, M., Ma- 
dara, J.L., Holmes, R.K., Lencer, W.I., Ruston, S., Madara, J.L., and Hirst, T. 
(1998). Ganglioside structure dictates signal transduction by cholera toxin 
and association with caveolae-like membrane domains in polarized epithelia. 
J. Cell Biol. 141 , 917-927. 

Yeung, T., Gilbert, G.E., Shi, J., Silvius, J., Kapus, A., and Grinstein, S. (2008). 
Membrane phosphatidylserine regulates surface charge and protein localiza- 
tion. Science 319 , 210-213. 

Yin, H.L., and Janmey, P.A. (2003). Phosphoinositide regulation of the actin 
cytoskeleton. Annu. Rev. Physiol. 65, 761-789. 



594 Cell 161, 581-594, April 23, 2015 ©2015 Elsevier Inc. 




Article 



Cel 

A Lactate-Induced Response to Hypoxia 

Authors 

Dong Chul Lee, Hyun Ahm Sohn 

Kyung Chan Park, Young II Yeom 

Correspondence 
kpark@kribb.re.kr (K.C.P.), 
yeomyi@kribb.re.kr (Y.I.Y.) 

In Brief 

Lactate, a common product of anaerobic 
metabolism, can promote a hypoxic 
response independent of HIF. It does so 
by binding as stabilizing the NDRG3 
protein that, in turn, triggers signals for 
cell growth and angiogenesis. 



Highlights 

• NDRG3 is an oxygen-regulated substrate of PHD2A/HL 
pathway 

• Lactate binds to NDRG3, boosting its levels in hypoxia 

• NDRG3 activates Raf-ERK signaling to mediate lactate- 
triggered hypoxia responses 



Accession Numbers 

GSE55214 



Graphical Abstract 




NDRG3 



NDRG3 



PHD2 






VHL 



NDRG3 

VHL 






NDRG3 



i 



Degradation 




ERK1/2 









ERK1/2; 

i 

Growth, Angiogenesis 



Lee et al., 2015, Cell 161 , 595-609 
CrossMark April 23, 2015 ©2015 Elsevier Inc. 

http://dx.d 0 i. 0 rg/l 0.1 01 6/j.cell.201 5.03.01 1 



CelPress 



Article 



Cell 



A Lactate-Induced Response to Hypoxia 

Dong Chul Lee,^ Hyun Ahm Sohn,^ Zee-Yong Park,^ Sangho Oh,^ Yun Kyung Kang,® Kyoung-min Lee,® Minho Kang, 

Ye Jin Jang,^ Suk-Jin Yang,^ Young Ki Hong,^ Hanmi Noh,^-^ Jung-Ae Kim,^>^ Dong Joon Kim,^ Kwang-Hee Bae,^-^ 

Dong Min Kim,^ Sang J. Chung, ^ Hyang Sook Yoo,^ Dae-YeuI Yu,^ Kyung Chan Park,^>* and Young II Yeom^-^,? * 

■'Medical Genomics Research Center 

^Ochang Branch Institute 

^Korean Bioinformation Center 

^Research Center for Integrative Cellulomics 

Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 305-806, Korea 
^Department of Life Science, Gwangju Institute of Science and Technology, Gwangju 500-712, Korea 
^Department of Pathology, Inje University Seoul Paik Hospital, Seoul 100-032, Korea 
^Department of Functional Genomics, Korea University of Science and Technology, Daejeon 305-350, Korea 
*Correspondence: kpark@kribb.re.kr (K.C.P.), yeomyi@kribb.re.kr (Y.I.Y.) 
http://dx.doi.Org/1 0.1 01 6/j.cell.201 5.03.01 1 



SUMMARY 

Organisms must be able to respond to low oxygen in 
a number of homeostatic and pathological contexts. 
Regulation of hypoxic responses via the hypoxia- 
inducible factor (HIF) is well established, but evi- 
dence indicates that other, HIF-independent mecha- 
nisms are also involved. Here, we report a hypoxic 
response that depends on the accumulation of 
lactate, a metabolite whose production increases in 
hypoxic conditions. We find that the NDRG3 protein 
is degraded in a PHD2/VHL-dependent manner in 
normoxia but is protected from destruction by bind- 
ing to lactate that accumulates under hypoxia. The 
stabilized NDRG3 protein binds c-Raf to mediate 
hypoxia-induced activation of Raf-ERK pathway, 
promoting angiogenesis and cell growth. Inhibiting 
cellular lactate production abolishes the NDRG3- 
mediated hypoxia responses. Our study, therefore, 
elucidates the molecular basis for lactate-induced 
hypoxia signaling, which can be exploited for the 
development of therapies targeting hypoxia-induced 
diseases. 

INTRODUCTION 

Oxygen homeostasis is essential for metazoan physiology. 
Under low oxygen conditions, cells resort to hypoxia-induced 
responses to adapt to and survive harsh environments (Cassa- 
vaugh and Lounsbury, 2011). Hypoxia responses are an integral 
part of normal physiology during embryonic development and 
postnatal life. They are also pathophysiologic components of 
many disorders, including cancer, inflammation, and cardiovas- 
cular diseases. 

Hypoxia inducible factors (HIFs) play central roles in hypoxia 
responses by controlling the expression of a host of hypoxia- 
responsive genes functioning in diverse processes, including 
metabolism, oxygen delivery, pH regulation, angiogenesis, cell 
proliferation, and survival (Harris, 2002; Cassavaugh and Louns- 



bury, 2011). In particular, the HIF-mediated upregulation of 
glycolysis and suppression of the citric acid (TCA) cycle is a 
crucial adaptive response at the early stage of hypoxia (Cassa- 
vaugh and Lounsbury, 2011). The expression and activity of 
HIFs are tightly regulated by oxygen-dependent hydroxylation 
of their a subunits (Semenza, 2003). 

Growing evidence indicates that hypoxia has many aspects 
that are not explained by HIF-mediated mechanisms alone. For 
example, the inhibition of HIF-mediated pathways does not al- 
ways prevent tumor growth; tumors derived from HIF-la-defi- 
cient embryonic stem (ES) cells have growth advantages owing 
to decreased hypoxia-induced apoptosis and increased stress- 
induced proliferation (Carmeliet et al., 1 998). A number of reports 
suggest that tumor angiogenesis constitutes the major pathway 
of HIF-independent tumorigenesis. Thus, angiogenesis was 
preserved when HIF1A was knocked-out in ES cells (Hopfl 
et al., 2002). Several lines of evidence indicate that the pro- 
angiogenic factor, vascular endothelial growth factor (VEGF), 
can be induced via both HIF-dependent and HIF-independent 
pathways (Mizukami et al., 2004). Induction of other pro-angio- 
genic factors such as IL-8 preserves the angiogenic response 
in HIF-la-deficient colon cancer cells (Mizukami et al., 2005). 
Moreover, multiple pathways and transcription factors (TFs) 
other than HIFs are known to respond to hypoxia to induce bio- 
logical responses in a HIF-independent manner. Among those 
oxygen-regulatable TFs are NF-kB, AP-1, and CEBP, which 
are activated in hypoxia (Cummins and Taylor, 2005). Conse- 
quently, several reports demonstrated that some of the genes 
regulated by hypoxia were not regulated by HIFs, suggesting a 
role for other oxygen-regulated pathways that are, similar to 
HIF pathways, controlled by prolyl hydroxylase domain (PHD) 
enzymes (Elvidge et al., 2006). Also, a number of protein kinases 
such as PKA, PKC, PI3K, AKT, JNK, PTK2B (Pyk2), SRC, 
MAPK14 (p38), and ERK1/2 are reported to be activated in hyp- 
oxia (Seta et al., 2002). However, despite all of these studies, key 
elements and mechanisms responsible for oxygen-dependent 
regulation of the HIF-independent branch of hypoxia responses 
remain elusive. 

In this study, we identified an oxygen-regulated protein, 
NDRG3 (NDRG family member 3; NM_032013), as a bona fide 
substrate of the PHD2/VHL system. NDRG3 was highly induced 
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under oxygen-limited conditions in diverse cell types, although 
its mRNA expression was independent of HIF levels under hyp- 
oxia. Interestingly, NDRG3 required binding by the glycolytic 
end-product lactate for its hypoxic accumulation, rendering its 
expression indirectly dependent on HIF expression as HIF-1a 
regulates the hypoxic expression of lactate dehydrogenase A 
(LDHA). We found that NDRG3 plays critical roles in lactate- 
induced hypoxia signaling by mediating the activation of the 
Raf-ERK pathway to promote angiogenesis and cell growth dur- 
ing prolonged hypoxia. Thus, NDRG3 provides a critical genetic 
element for the oxygen- and lactate-dependent regulation of 
prolonged hypoxia responses. 

RESULTS 

Identification of NDRG3 as the Substrate of PHD2 

To identify the regulators of hypoxia responses, we searched for 
PHD2-binding proteins in MCF-7 cells expressing Flag-tagged 
PHD2 via Flag-mediated immunoprecipitation coupled to mass 
spectrometry. Among the candidates enriched in the protein 
bands reproducibly exhibiting differential immunoprecipitation 
patterns between mock and PHD2-Flag fractions, we chose 
NDRG3 for further studies since it belongs to a gene family impli- 
cated in cell proliferation, migration, and invasion as well as in 
differentiation and development (Melotte et al., 2010), which 
are biological features closely associated with hypoxia (Harris, 
2002; Cassavaugh and Lounsbury, 2011) (Figure SI A). 

To characterize NDRG3 in detail, we developed an affinity-pu- 
rified polyclonal antibody specific to NDRG3 among the human 
NDRG family members (Figure SIB). This antibody detected 
NDRG3 as a 42-KDa band in the PHD2-Flag immunoprecipita- 
tion fraction (Figure 1A). We verified the NDRG3-PHD2 interac- 
tion by immunoprecipitating endogenous NDRG3 with PHD2- 
Flag from HeLa cells grown under hypoxia (Figure IB) and 
directly by a pull-down assay using recombinant PHD2-His 
and NDRG3-GST proteins (Figure SIC). Thus, we concluded 
that NDRG3 is a bona fide PHD2-binding protein. 

We then examined possible functional relationships between 
PHD2 and NDRG3 using a PHD inhibitor, desferrioxamine 
(DFX). Although the basal-level expression of NDRG3 was negli- 
gible, PHD inhibition caused its dose-dependent accumulation 
in HeLa (Figure 1 C) and MCF-7 cells (Figure SI D). These results 
were reproducible with two other PHD inhibitors, dimethyloxa- 
loylglycine (DMOG) and C 0 CI 2 (Figure S1E), suggesting that 
the NDRG3 protein expression might be under PHD-mediated 
posttranslational control. We then examined different PHD family 
members for their involvement in the regulation of NDRG3 by 
silencing their expression under normoxia using small interfering 
RNAs (siRNAs). The analysis results revealed that, as in the case 
of HIF-1a, PHD2 is the major regulator of NDRG3 expression 
among the PHD family members (Figure 1 D, left). This was sup- 
ported by the identification of differential interactions between 
NDRG3 and PHD2 in a co-immunoprecipitation assay (Fig- 
ure SI F). Depletion of VHL, the targeting element of E3 ubiquitin 
ligase complex, also caused NDRG3 accumulation under nor- 
moxia (Figure ID, right), suggesting that NDRG3 is likely a 
target of PHD2A/HL-mediated posttranslational modification. 
To address this point more thoroughly, we prepared several var- 



iants of the NDRG3 protein carrying single amino acid changes in 
their putative PHD2-docking site, predicted from a docking 
model between a putative NDRG3 structure and the published 
PHD2 structure (Chowdhury et al., 2009) (Figure S1G). A co- 
immunoprecipitation assay showed that the NDRG3 mutants 
could be ranked according to their PHD2-binding strengths in 
the following order: V296D > Q97E > R47D ~ N66D, which, 
interestingly, appeared to be inversely correlated with their pro- 
tein expression levels in normoxia (Figure 1 E). Moreover, NDRG3 
variants retaining higher affinity for PHD2 co-immunoprecipi- 
tated higher amount of HA-tagged VHL protein (Figure IE), 
indicating that the interaction of NDRG3 with PHD2 and VHL 
is a critical determinant of its protein expression. Next, in an 
in vivo ubiquitination assay, the amount of ubiquitin immunopre- 
cipitated with NDRG3 was increased by overexpression of 
NDRG3, while it was decreased by silencing of its expression 
by different short hairpin RNAs (shRNAs) (Figures IF and S1H). 
In addition, proteasome inhibition with MG132 dramatically 
increased the detected levels of NDRG3 in HeLa cells (Fig- 
ure SI I). Collectively, these results demonstrate that NDRG3 is 
a PHD2-interacting protein whose expression is negatively regu- 
lated by PHD2A/HL-mediated proteasomal pathways. 

Oxygen-Dependent Regulation of NDRG3 Protein 
Expression 

Since PHD2 critically depends on O 2 availability for its activity, 
we examined whether the NDRG3 protein expression is regu- 
lated in an oxygen-dependent manner. NDRG3 accumulated in 
MCF-7 cells at rates that were inversely correlated with O 2 con- 
centrations (Figures 2A and S2A). Consistent with this, NDRG3 
ubiquitination was significantly suppressed in HeLa cells under 
hypoxia (Figure 2B). The hypoxic induction of NDRG3 was 
demonstrated in cancer cells of diverse tissue origins as well 
as in non-transformed cells (Figure S2B), suggesting a universal- 
ity of the phenomenon. However, in contrast to HIF-1a protein 
showing a sort of bell-shaped induction pattern at the early stage 
of hypoxia, NDRG3 exhibited a sigmoidal expression pattern, 
starting when HIF-1a levels began to decline and lasting until 
later stages of hypoxia (Figure 2C). The hypoxic expression of 
NDRG3 slowly diminished as cells were reoxygenated (Fig- 
ure S2C). These results strongly suggest that NDRG3 protein 
expression is negatively regulated by oxygen. 

Next, we investigated the molecular basis of the oxygen- 
dependent regulation of NDRG3 protein expression. Mass 
spectrometric analysis revealed that NDRG3 is specifically hy- 
droxylated at proline 294, suggesting that it might be the residue 
modified by PHD2 (Figure 2D). Site-directed mutagenesis of pro- 
line 294 to alanine (P294A) resulted in pronounced accumulation 
of the variant protein in normoxia (Figure 2E, left). Moreover, a 
co-immunoprecipitation assay showed that the P294A mutant 
protein possessed a significantly reduced binding affinity for 
PHD2 and VHL proteins compared to wild-type (Figure 2E, right), 
indicating that proline 294 is the critical target site of PHD2- 
mediated hydroxylation that determines NDRG3 protein stability 
in normoxia. 

Since the expression of HIF-1a immediately preceded that of 
NDRG3 (Figure 2C), we investigated the possibility of HIF-1a 
transcriptionally regulating NDRG3 expression during hypoxia. 
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Figure 1. NDRG3 Protein Is Regulated by PHD2/VHL-Mediated Proteasomal Pathway 

(A) Identification of NDRG3 as a PHD2-binding protein in MCF-7 cells. 

(B) Validation of PHD2-NDRG3 interaction in HeLa cells under hypoxia. 

(C) Induction of NDRG3 protein by inhibiting PHDs with desferrioxamine (DFX) in HeLa cells at normoxia. Results are mean ± SD of three experiments. 

(D) Effects of depleting different PHD family members (left) or VHL (right) on NDRG3 protein expression in HeLa cells in normoxia. 

(E) Expression pattern of NDRG3 variants mutated in putative PHD2-binding sites and their interaction with PHD2 and VHL in HEK293T cells. 

(F) Ubiquitination assay of NDRG3 protein in HeLa cells at normoxia. 

See also Figure SI . 



RT-PCR analysis showed that NDRG3 mRNA level remained 
virtually unchanged during hypoxia, even when HIF proteins 
reached their peak levels (Figure S2D). This result indicates the 
HIF independence of NDRG3 transcription and confirms the 
posttranslational nature of NDRG3 expression during hypoxia. 
Depletion of different subunits of HIF had no effects on NDRG3 
mRNA levels, confirming the HIF independence of its transcrip- 
tion (Figure 2F). It is noteworthy that although NDRG3 protein 
expression in hypoxia was clearly detectable in HIF-silenced 
cells, it was significantly reduced compared to control by HIF- 
1p knockdown and, to a much lesser extent, by HIF-1a knock- 



down, suggesting a potential non-transcriptional effect of the 
HIF pathway on NDRG3 protein expression. Meanwhile, we 
could show that HIF played a role as a transcriptional activator 
for the hypoxic expression of another NDRG family member, 
NDRG1 (Figure S2E). These results collectively indicate that 
HIF activity is not required for the transcriptional regulation of 
NDRG3 expression during hypoxia. 

Role of NDRG3 in the Regulation of Hypoxia Responses 

We investigated the potential functions of NDRG3 in hypoxia 
by correlating its protein expression profile with the genomic 
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Figure 2. NDRG3 Expression Is Regulated in an Oxygen-Dependent Manner 

(A) NDRG3 protein expression at different oxygen concentrations in MCF-7 cells. Quantified values for western blot images are shown on the right. Results are 
mean ± SD of three experiments. 

(B) Oxygen dependency of NDRG3 protein ubiquitination in vivo. 

(C) NDRG3 protein expression in MCF-7 cells during prolonged hypoxia. Quantified values for western blot images are shown on the right. Results are mean ± SD 
of three experiments. 

(D) Site of prolyl hydroxylation in NDRG3 identified by micro-LC-MS/MS analysis. 

(E) Expression pattern of an NDRG3 variant mutated in putative prolyl hydroxylation site (P294A) (left) and its interaction with PHD2 and VHL proteins (right) in 
HEK293T cells. WT, wild-type. 

(F) Effects of silencing HIF proteins on hypoxic expression of NDRG3 in Huh-1 cells. 

See also Figure S2. 
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activity profile of five gene ontology categories representative of 
hypoxia responses (Figure S3A). The genomic activity of a gene 
ontology was estimated via the gene set enrichment analysis, 
whereby a standardized difference score (Z score) was calcu- 
lated from transcriptome expression data of Huh-7 cells at a 
particular time point during hypoxia. The results showed that 
NDRG3 protein expression was highly correlated with the activity 
of “angiogenesis,” “anti-apoptosis,” “proliferation (positive),” 
and “motility” functions but not with “glycolysis” (Figure 3A). 
On the other hand, depletion of NDRG3 at 24 hr under hypoxia, 
when cellular NDRG3 protein expression should have otherwise 
reached a significant level, caused significant changes in 
the activity of “angiogenesis,” “anti-apoptosis,” “proliferation 
(positive),” and “motility” categories but not that of “glycolysis” 
(Figure 3B). In contrast, “glycolysis” was significantly targeted 
by HIF-1a depletion at 6 hr under hypoxia, when HIF-1a 
protein expression is expected to have reached its peak level 
(Figure S3B). Consistently, the ectopic expression of a nor- 
moxia-stable variant of NDRG3 (N66D in Figure 1E) caused 
the upregulation (>1 .5-fold) of genes having primary functions 
in angiogenesis > proliferation ~ growth ~ apoptosis ~ migra- 
tion > glycolysis (Figure S3C). 

We then experimentally evaluated the roles of NDRG3 in 
“angiogenesis,” “anti-apoptosis,” and “proliferation,” the func- 
tions often implicated in tumor growth and significantly targeted 
by NDRG3 depletion (Figure 3B). In tube forming assays using 
HUVEC cells, NDRG3 depletion caused significant suppression 
of the angiogenic activity induced by hypoxia in Huh-7 cells 
(Figure S3D). In parallel, the Matrigel plug assay showed that 
NDRG3 knockdown inhibited the angiogenic activity of Huh-7 
cells in BALB/c-nu mice (Figure 3C). At the molecular level, 
hypoxia-induced expression of pro-angiogenic markers was 
abolished by NDRG3 depletion, while it was upregulated in nor- 
moxia by NDRG3(N66D) (Figure 3D). Next, examination of the 
anti-apoptotic activity of NDRG3 via caspase-3/7 and PARP 
cleavage assays indicated that NDRG3 depletion significantly 
promotes apoptosis in hypoxia (Figure 3E). Accordingly, the hyp- 
oxia-induced expression of anti-apoptotic genes, notably mem- 
bers of the lAP (inhibitor of apoptosis proteins) family, was abol- 
ished by NDRG3 depletion in Huh-7 cells (Figure 3F). Moreover, 
the depletion of NDRG3 using an shRNA targeting its 3'-UTR 
(Figure S1H, #5) significantly inhibited the growth of Huh-7 cells 
under mild hypoxia (3% O 2 ; Figure 3G), but this phenotype was 
effectively rescued by a recombinant NDRG3(N66D) expression 
vector lacking the natural 3'-UTR sequences of NDRG3 (Figures 
3G and S3E). In addition, NDRG3 knockdown strongly sup- 
pressed the tumorous growth of Huh-7 cells in BALB/c-nu 
mice (Figures 3H and S3F). Interestingly, simultaneous depletion 
of NDRG3 and either of the HIFs completely abrogated tumor 
growth, suggesting complementary roles for NDRG3 and HIFs 
in hypoxic cell growth (Figure 3H). Immunofluorescence micro- 
scopy of resected tumors revealed that NDRG3 depletion 
effectively suppressed the expression of markers of tumor 
angiogenesis (IL8 and CD31) and cell proliferation (Ki-67), while 
their levels in HIF-depleted tumors were comparable to those 
in controls (Figure S3G). In contrast, the ectopic expression of 
NDRG3(N66D) highly promoted colony formation of Huh-1 cells 
in soft agar (Figure S4J) as well as their tumorigenic activity in 



BALB/c-nu mice (Figures 3I and S3H). These results demon- 
strate that NDRG3 plays crucial roles in promoting angiogenesis, 
anti-apoptosis, and cell proliferation during hypoxia. 

L-Lactate Triggers the NDRG3- Mediated Hypoxia 
Responses 

Compared to HIF-1a, which showed an early induction pattern 
during hypoxia and rapidly disappeared upon reoxygenation of 
cells, NDRG3 started accumulating relatively later in hypoxia 
and its levels slowly declined upon reoxygenation (Figures 2C 
and S2C). The long lag periods observed for the accumulation 
and degradation of NDRG3 suggested that multiple layers of 
regulation might be involved in its hypoxic expression. Therefore, 
we explored biochemical features relevant to “prolonged hypox- 
ia” other than low oxygen levels and found that NDRG3 protein 
expression is highly correlated with cellular lactate production; 
NDRG3 protein expression began at ~6 hr under hypoxia, 
closely following the lactate production pattern (Figure 4A). On 
the other hand, suppression of lactate production with a LDHA 
inhibitor, sodium oxamate, specifically inhibited the NDRG3 
protein accumulation in a dose-dependent manner (Figure 4B). 
Similarly, inhibition of lactate production via siRNA-mediated 
depletion of LDHA (Figure 4C) or disruption of glycolysis with 
2-deoxyglucose (Figure S4A) suppressed the hypoxic NDRG3 
protein expression. Depriving cells of glucose and/or glutamine, 
the input substrates for glycolysis and glutaminolysis, respec- 
tively— two major metabolic pathways leading to intracellular 
lactate production— also reduced the NDRG3 protein accumula- 
tion with a parallel reduction in lactate production but without 
affecting the transcription of NDRG3 (Figure 4D). However, 
compared to the significant consequences of glucose depriva- 
tion, the glutamine effect seemed relatively minor. In contrast, 
the facilitation of lactate production (via LDHA overexpression 
and/or pyruvate overfeeding; Figure S4B) or its intracellular 
accumulation (by blocking export through MCT4; Figure 4C) 
augmented the hypoxic accumulation of NDRG3 protein. These 
results indicate that, unlike HIF proteins, oxygen deprivation per 
se is not enough to cause the accumulation of NDRG3 protein, 
but glycolytic production of lactate is additionally required. 

We then verified the effects of lactate on NDRG3 protein dy- 
namics more directly by providing exogenous lactate to the cells 
whose intracellular lactate production had been compromised 
by genetic or pharmacological means. Lactate exogenously 
added to Huh-1 cells dose dependently restored the hypoxic 
NDRG3 protein expression that had been reduced by LDHA 
silencing, without affecting the level of NDRG3 mRNA or HIF- 
1 a protein (Figure 4E). Similar results were obtained when lactate 
production was suppressed by glucose deprivation (Figure S4C) 
or oxamate treatment (Figure S4D). However, the lactate-medi- 
ated restoration of NDRG3 protein expression was abrogated 
by siRNA targeting MCT1 , a monocarboxylate transporter res- 
ponsible for importing extracellular lactate into the cell, both in 
normoxic and hypoxic conditions (Figure 4F). We also observed 
similar effects of MCT1 knockdown in Huh-1 cells subjected to 
oxamate treatment or glucose deprivation (Figures S4E and 
S4F). Collectively, these results indicate that NDRG3 requires 
lactate build-up for its protein accumulation under hypoxia, 
pointing to the possibility that NDRG3 might function as a 
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Figure 3. NDRG3 Is a Critical Regulator of Prolonged Hypoxia Responses 

(A) Correlation analyses between the NDRG3 protein expression during hypoxia and the activity of five representative hypoxia-responsive gene sets. 

(B) Changes in the activity of hypoxia-responsive gene sets upon NDRG3 silencing in Huh-7 cells at 24 hr under hypoxia (1 % O 2 ). 

(C) Matrigel plug assay of NDRG3-mediated angiogenic activity. The p value was assessed by Student’s t test. 

(D) Regulation of pro-angiogenic gene expression by NDRG3. Gene expression in A/Df?G3-silenced Huh-7 cells under hypoxia (1% oxygen, left) or in 
NDRG3(N66D)-overexpressing HeLa cells at normoxia (right) was examined by RT-PCR. 

(E) Effects of NDRG3 knockdown on hypoxia-induced apoptosis in Huh-7 cells. Results are mean + SD of three experiments. The p value was assessed by 
Student’s t test. 

(F) Regulation of hypoxia-induced anti-apoptotic gene expression by NDRG3. 

(legend continued on next page) 
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hypoxia-inducible lactate sensor, triggering HIF-independent 
biologic responses in the cell. 

We, therefore, examined the functional significance of hypoxic 
NDRG3 expression in the context of lactate metabolism. Inhibi- 
tion of lactate production with oxamate caused a dose-depen- 
dent suppression of Huh-1 cell growth under mild hypoxia 
(Figure S4G). However, this effect was effectively rescued by 
ectopically expressing NDRG3(N66D) (Figure S4G), suggesting 
that NDRG3 may play a critical role in lactate-induced hypoxic 
cell growth. Apparently, NDRG3(N66D) exerted no direct 
effects on lactate production irrespective of oxamate treatment 
(Figure S4H) nor was its expression affected by oxamate (Fig- 
ure S4I), indicating that the rescuing effect is truly inherent in 
NDRG3(N66D) itself. The effect of oxamate on cell growth and 
its rescue by NDRG3(N66D) were further corroborated by colony 
forming assays using Huh-1 cells (Figure S4J). We then exam- 
ined the role of NDRG3 in the growth of cells whose LDHA 
expression was ablated by RNAi. Depletion of LDHA by shRNA 
suppressed the growth of Huh-1 cells under mild hypoxia (Fig- 
ure S4K) as well as their tumorous growth in BALB/c-nu mice 
(Figures 4G and S4L). Again, NDRG3(N66D) effectively compen- 
sated for the LDHA deficit both in vitro and in vivo. Moreover, in 
tube forming assays using HUVEC cells, oxamate suppressed 
the angiogenic activity induced in Huh-1 cells under hypoxia 
(Figure 4H). However, the ectopic expression of NDRG3(N66D) 
restored the angiogenic activity in these cells in spite of the 
oxamate treatment. Thus, lactate appears to be a crucial 
signal for hypoxic cell growth and angiogenesis, and NDRG3 
functions as a key mediator of the lactate-induced hypoxia 
responses. 

Molecular Mechanism of the Lactate Regulation of 
NDRG3 Protein Expression 

We investigated the molecular mechanism for the lactate-in- 
duced NDRG3 protein accumulation by examining the effect of 
lactate on the ubiquitination of NDRG3. In vitro, lactate inhibited 
NDRG3 ubiquitination, catalyzed by the PHD2A/HL complex 
immunoprecipitated from HEK293T cells (Figure 5A), indicating 
that lactate can block the modification of NDRG3 protein by 
PHD2A/HL. It seems clear that lactate does not affect the HIF- 
1a protein expression under hypoxia (Figures 4 and S4). We, 
therefore, examined the possibility of lactate directly modulating 
NDRG3 by investigating interactions between the two molecules. 
An in vitro binding experiment using GST-tagged recombinant 
NDRG3 protein and p"^C]-labeled L-lactate indicated that 
NDRG3 physically and directly binds lactate (Figures 5B and 
S5B). To verify the NDRG3-lactate interaction further, we pre- 
dicted the putative lactate-binding domain of NDRG3 by a 
docking simulation (not shown). Site-directed mutagenesis of 
the predicted lactate-binding domain showed that mutations in 
some of its amino acid residues can impair the hypoxic accumu- 
lation of the mutant proteins (Figure S5A). One of the variants 



whose glycine-138 was mutated to tryptophan (N3(G138W) in 
Figure S5A) hardly accumulated under hypoxia, but accumulated 
in the presence of MG132, suggesting that it may have lost the 
lactate-binding capability necessary for escaping PHD2A/HL- 
mediated proteasomal degradation. Indeed, we observed that 
recombinant N3(G138W)-GST protein has a severely impaired 
lactate-binding capability in an in vitro binding assay (Figures 
5C and S5B). These results suggest that binding by lactate 
inhibits proteasomal degradation of NDRG3 by blocking its modi- 
fication by PHD2A/HL. Moreover, once formed, the NDRG3- 
lactate complex seems to remain quite resistant to the PHD2/ 
VHL-mediated modification since the hypoxically accumulated 
NDRG3 protein was maintained for a while after culturing the 
cells in fresh medium under normoxia (Figure S2C). In contrast, 
HIF-1 a rapidly disappeared upon reoxygenation, demonstrating 
the exquisite oxygen dependency of its post-translational 
regulation. 

We further investigated the mechanisms behind the lactate- 
induced changes in NDRG3 protein dynamics through protein 
binding analyses in HEK293T cells expressing epitope-tagged 
NDRG3, PHD2, and VHL Binding between NDRG3 and PHD2 
during early (6 hr) or late (24 hr) hypoxia did not significantly differ 
from that in normoxia, suggesting that neither low oxygen nor 
high lactate levels affects the NDRG3-PHD2 interaction (Fig- 
ure 5D). By contrast, binding between NDRG3 and VHL was 
significantly reduced at 24 hr under hypoxia, while it was main- 
tained at normoxic levels at 6 hr under hypoxia, indicating that 
high lactate levels but not low oxygen levels might affect the 
NDRG3-VHL interaction. We then verified these observations 
using NDRG3 variants having defects in prolyl hydroxylation by 
PHD2 (P294A) or lactate binding (G138W). None of the wild- 
type or variant NDRG3 species showed significant differences 
in their PHD2-binding capacity between normoxia and hypoxia 
(24 hr) (Figure 5E). On the other hand, the VHL-binding capacity 
of wild-type NDRG3 was significantly reduced under hypoxia 
compared to that in normoxia, while those of P294A and 
G138W were barely changed by hypoxia. Notably, the interac- 
tion of P294A with VHL was negligible whereas the G138W- 
VHL interaction was strongly maintained, regardless of oxygen 
level. Consistently, ubiquitination of wild-type NDRG3 was 
significantly reduced under hypoxia, while P294A and G138W 
were negligibly and strongly ubiquitinated, respectively, in both 
normoxia and hypoxia (Figure 5F). Conversely, oxamate treat- 
ment specifically augmented the hypoxic interaction of wild- 
type NDRG3 with VHL as well as its ubiquitination (Figures 5G 
and S5C). Inhibition of hypoxic lactate production by glucose 
deprivation also resulted in the augmentation of the NDRG3- 
VHL interaction (Figure S5D). Addition of exogenous lactate to 
oxamate-treated cells specifically inhibited the NDRG3-VHL 
interaction in both normoxia and hypoxia (Figure S5E). Thus, 
we conclude that the NDRG3-PHD2 interaction is not affected 
by cellular oxygen or lactate levels, while the NDRG3-VHL 



(G) Inhibition of hypoxic cell growth by NDRG3 knockdown and its rescue by NDRG3(N66D) overexpression. Results are mean ± SD of three experiments. The p 
value was assessed by Student’s t test. 

(H) Effects of the knockdown of NDRG3 and/or HIF-a on the tumorigenic activity of Huh-7 cells in vivo. The p value was assessed by Student’s t test. 

(I) Tumorigenic activity of Huh-1 cells overexpressing NDRG3(N66D). The p value was assessed by Student’s t test. 

See also Figure S3. 
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Figure 4. Lactate Signals for the NDRGS-Mediated Hypoxia Responses 

(A) Intracellular NDRG3 protein accumulation and gross lactate production by MCF-7 cells during hypoxia. N, normoxia. Results are mean ± SD of three ex- 
periments. The p value was assessed by Student’s t test. 

(B) Effects of the pharmacological inhibition of lactate dehydrogenase on hypoxic lactate production and NDRG3 protein expression by MCF-7 cells. Results are 
mean ± SD of three experiments. The p value was assessed by Student’s t test. 

(C) Effects of depleting the genes of lactate metabolism on the hypoxic expression of NDRG3 protein and intracellular lactate levels in HeLa cells. Results are 
mean ± SD of three experiments. The p value was assessed by Student’s t test. 

(D) Hypoxic expression of NDRG3 and lactate production in Huh-1 cells deprived of glucose (Glc) or glutamine (Gin). Results are mean ± SD of three experiments. 
The p values were assessed by Student’s t test. 

(E) Effects of exogenous lactate on the hypoxic expression of NDRG3 in LDHA-silenced Huh-1 cells. 

(F) Effects of MCT1 depletion on the NDRG3 protein expression induced by exogenous lactate in LDHA-silenced Huh-1 cells. 



(legend continued on next page) 
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interaction is significantly inhibited by lactate but not by low ox- 
ygen levels. 

In summary, excess lactate built up during hypoxia directly 
binds to NDRG3 and inhibits its ubiquitination and proteasomal 
degradation by disrupting the NDRG3-VHL interaction. How- 
ever, the question remains whether failure of NDRG3 ubiq- 
uitination in high lactate conditions is due to the inhibition of 
PHD2-mediated hydroxylation of NDRG3 that is required for 
VHL binding and awaits further studies. 

Activation of Raf-ERK Signaling by NDRG3 during 
Hypoxia 

To understand the molecular mechanisms of NDRG3's function 
in hypoxia, we searched for possible NDRG3-regulated kinases 
via a phosphoarray analysis using PLC/PRF/5 cells stably ex- 
pressing shRNA for NDRG3 or GFP (Figure S6A). NDRG3 deple- 
tion selectively suppressed hypoxia-induced ERK1/2 phosphor- 
ylation (Figures 6A and S6A). We then examined whether the 
kinases upstream of ERK1/2 could be regulated by NDRG3 
and found that hypoxia-induced phosphorylation of c-Raf (at 
Ser338) and B-RAF1 (at Ser445) is abrogated by NDRG3 deple- 
tion in SK-Hep-1 cells (Figure 6B). These results suggest that 
NDRG3 might play an essential role in the activation of the 
RAF-ERK signaling pathway. We, therefore, examined the effect 
of manipulating NDRG3 expression on c-Raf phosphorylation 
and found that ectopically expressed c-Raf was significantly 
phosphorylated in normoxia, with a concomitant phosphoryla- 
tion of ERK1/2 (Figure 6C). However, depletion of basal-level 
NDRG3 expression by siRNA abrogated this response. On the 
other hand, the ectopic expression of NDRG3(N66D) highly 
induced the phosphorylation of c-Raf and ERK1/2 (Figure 6C). 
Also, the hypoxia-induced phosphorylation of endogenous c- 
Raf and ERK1/2, which was suppressed by the 3'-UTR-targeting 
shRNA of NDRG3, could be rescued by the recombinant 
NDRG3(N66D) expression vector (Figure 6D). Reciprocal 
in vitro pull-down assays indicated that NDRG3 can physically 
and directly interact with c-Raf (Figure S6B). Consistently, 
ectopically expressed c-Raf immunoprecipitated endogenous 
NDRG3 protein specifically under hypoxia (Figure S6C). More- 
over, an NDRG3(N66D)-containing complex immunoprecipi- 
tated from HEK293T cells mediated the phosphorylation of 
recombinant c-Raf in an in vitro kinase assay (Figure 6E). These 
results indicate that NDRG3 is directly involved in the phosphor- 
ylation of c-Raf. 

We then examined the biological implications of the NDRG3- 
mediated c-Raf-ERK1/2 phosphorylation. We observed, from 
the immunoprecipitation analysis of endogenous proteins, in- 
creasing amounts of c-Raf-NDRG3 complexes during the pro- 
gression of hypoxia, in parallel with a temporal increase in the 
phosphorylation levels of c-Raf and ERK1/2 (Figure 6F). This 
result suggests a potential role of NDRG3-mediated c-Raf- 
ERK1/2 phosphorylation in hypoxia response regulation. Abla- 
tion of LDHA to inhibit lactate production effectively suppressed 



the hypoxia-induced phosphorylation of c-Raf and ERK1/2 as 
well as NDRG3 protein expression (Figure 6G). Exogenously pro- 
vided lactate rescued the siLDHA-mediated suppression of c- 
Raf and ERK1/2 phosphorylation, but this rescue was blocked 
by silencing MCT1 expression. In addition, disruption of glycol- 
ysis via glucose deprivation effectively suppressed the hypoxic 
phosphorylation of c-Raf and ERK1/2 as well as NDRG3 expres- 
sion, which could be rescued by NDRG3(N66D) (Figure 6H). In 
contrast, glutamine deprivation exhibited negligible effects. 
These results indicate that hypoxia-induced phosphorylation of 
c-Raf and ERK1/2 is dependent on lactate production, mainly 
from glycolysis, and NDRG3 functions as an essential mediator 
of the lactate-induced activation of Raf-ERK pathway. 

Dependence of Lactate-Induced Hypoxic Cell Growth 
and Angiogenesis on NDRG3-Mediated ERK1/2 Activity 

Finally, we examined the biologic relevance of the NDRG3-medi- 
ated activation of the Raf-ERK pathway to lactate-triggered 
hypoxia responses. Exogenously provided lactate significantly 
compensated for the growth deficit of Huh-1 cells under 
mild hypoxia, caused by LDHA silencing (Figure 7A). However, 
the lactate-mediated rescue was abrogated by depletion of 
NDRG3 or pharmacological blockade of ERK signaling. Similarly, 
exogenous lactate restored the hypoxia-induced angiogenic ca- 
pacity of LDHA-knockdown Huh-1 cells in tube-forming assays, 
which was again abolished by NDRG3 depletion or ERK inhibition 
(Figure 7B). In parallel, the hypoxic expression of angiogenic 
marker genes, disrupted by LDHA knockdown, was recovered 
by exogenous lactate but disrupted again by NDRG3 depletion 
or ERK inhibition (Figure S7A). We then examined the relevance 
of NDRG3-mediated Raf-ERK pathway activation to the growth 
of tumors in vivo. Western blot analysis of the tumors formed 
by Huh-1 cells engineered for LDHA and/or NDRG3 expression 
in the in vivo tumorigenesis analysis (Figures 4G and S4L) 
indicated that phosphorylation of c-Raf and ERK was clearly up- 
regulated in tumors expressing NDRG3(N66D) compared to 
mock controls (Figures 7C and S7B). Consistently, we observed 
a predominant expression of angiogenic marker genes in 
NDRG3(N66D)-expressing tumors. These results, together with 
those in Figures 4 and 6, demonstrate that lactate plays essential 
roles in promoting cell growth and angiogenesis under hypoxia, 
depending on the NDRG3-mediated activation of the c-Raf- 
ERK1/2 pathway. 

We then examined the clinical relevance of NDRG3 expression 
and ERK1/2 activity by immunohistochemical analysis of human 
hepatocellular carcinoma (HCC). NDRG3 was barely expressed 
in the normal liver, while moderate to strong levels were detect- 
able in HCC tissues in the cytoplasm and the plasma membrane 
(Figure 7D). Among 103 HCC cases examined using antibodies 
for NDRG3 and phospho-ERK1/2, 25 cases (24.3%) were 
positive for NDRG3 protein expression in a manner that 
was significantly associated with ERK1/2 activation (Figure 7D). 
In summary, these results indicate that aberrant NDRG3 



(G) Dependence of the tumorous growth of Huh-1 cells on LDHA and its rescue by NDRG3. The p values were assessed by Student’s t test, n = 5/group. 

(H) Inhibition of hypoxia-induced angiogenesis by oxamate and its rescue by NDf?G3. Results are mean ± SD of two experiments. The p values were assessed by 
Student’s t test. Scale bar, 200 i^m. 

See also Figure S4. 
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Figure 5. Lactate Binds to NDRG3 and Blocks Its Ubiquitination by VHL 

(A) Effects of lactate on in vitro ubiquitination of NDRG3 by recombinant PHD2A/HL complex immunoprecipitated from HEK293T cells expressing PHD2-Flag and 
VHL-HA. 

(B) Molecular interaction between i-lactate and NDRG3 protein in vitro. Results are mean ± SD of three experiments. The p value was assessed by Student’s 
t test. 

(C) Molecular interaction between L-lactate and a variant NDRG3 protein (N3(G138W)) mutated in the putative L-lactate binding site. Results are mean ± SD of 
three experiments. The p value was assessed by Student’s t test. 

(D) Interaction profile of NDRG3 protein with PHD2 or VHL in HEK293T cells during the progression of hypoxia. Quantified values for western blot images are 
shown on the right. Results are mean ± SD of two experiments. The p value was assessed by Student’s t test. 

(legend continued on next page) 
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expression is closely associated with tumor development in vivo 
as well as the pathological activation of the ERK pathway. 

DISCUSSION 

Lactate has been regarded as a dead-end product of glycolysis 
and glutaminolysis until it recently emerged as an alternative en- 
ergy source and an inducer of tumor angiogenesis (Doherty and 
Cleveland, 2013). Knockdown of LDHA expression or inhibition 
of its activity suppressed tumor cell growth in vitro and in vivo 
(Fantin et al., 2006; Le et al., 2010). However, the key elements 
and mechanisms of lactate-induced biological responses re- 
mained unknown. In this study, we showed the existence of 
NDRG3-mediated lactate signaling and its roles in hypoxia re- 
sponses. During hypoxia, low oxygen concentrations and 
elevated lactate levels highly induced NDRG3 protein expres- 
sion, leading to the activation of the Raf-ERK pathway to promote 
angiogenesis and hypoxic cell growth. Thus, NDRG3 acts as a 
lactate sensor that triggers downstream kinase signaling in a hyp- 
oxia-dependent manner, and the NDRG3-Raf-ERK axis provides 
the genetic basis for the lactate-induced hypoxia responses. 

We showed that NDRG3 expression is genetically independent 
of HIFs and rather determined at the protein level by lactate. 
Lactate accumulates at the later phase of hypoxia, promoted 
by the upregulation of glycolysis and LDHA expression during 
the earlier stages of hypoxia where HIF-1a plays a critical role 
as a part of metabolic adaptation (Cassavaugh and Lounsbury, 
2011). Therefore, the lactate signaling and subsequent bio- 
logic responses appear to be functionally coupled to the HIF- 
la-induced metabolic reprogramming, by employing NDRG3 
as the critical link. In this regard, it is suggested that portions of 
the hypoxia responses, especially those occurring at the later 
phase of hypoxia, that have been so far attributed to HIF-1a 
might, in fact, be under the direct control of NDRG3-mediated 
lactate signaling. The results of gene set enrichment analysis 
for the functions of NDRG3 and HIF-1a during hypoxia support 
this possibility (Figures 3 and S3). Therefore, our study suggests 
that HIF-1 a and NDRG3 might form an oxygen-dependent regu- 
latory chain for hypoxia responses, which is broadly divided into 
two chronological phases (Figure 7E); at the early phase, low O 2 
levels signal forthe accumulation of HIF-1 a, which then regulates 
the gene expression necessary for early adaptive responses 
including metabolic reprogramming, while at the later phase, up- 
regulated lactate production signals for the accumulation of 
NDRG3, which subsequently activates the Raf-ERK pathway to 
induce responses necessary for coping with prolonged hypoxia. 

The lactate-NDRG3-Raf-ERK axis of hypoxia signaling sug- 
gests that hypoxic lactate production might be an integral part 
of normal physiology, playing active roles in promoting angio- 
genesis and cell growth under prolonged hypoxia. It stands to 
reason that the functional coupling between HIF-1 a-induced 



metabolic reprogramming and NDRG3-mediated lactate sig- 
naling ensures that cells facing prolonged hypoxia achieve the 
maximal possible growth in a hypoxic environment. This can 
be achieved by, first, generating biosynthetic building blocks 
and energy via the HIF-1 a-mediated upregulation of glycolysis, 
and subsequently, by providing cues for cell growth and 
angiogenesis via the NDRG3-mediated c-Raf-ERK signaling. 
Therefore, NDRG3-mediated lactate signaling may provide a 
self-sufficient mechanism for the cells in local tissues to recover 
from hypoxia without the need for additional extracellular sig- 
nals, for example, during development. Moreover, NDRG3- 
mediated signaling provides an extra layer of biological security 
for the cells escaping prolonged hypoxia since the NDRG3 pro- 
tein, once stabilized by lactate binding, remains quite stable 
even when cells are reoxygenated. 

Growing evidence suggests that lactate may play active roles 
in cancer progression, as it mediates cancer-cell intrinsic effects 
on metabolism as an oxidative metabolite and non-cancer-cell 
autonomous effects on several cell types in the tumor microen- 
vironment (Doherty and Cleveland, 2013). Our results indicate 
that glycolysis is the main source of lactate production that 
is responsible for the hypoxic induction of NDRG3 protein 
expression and Raf-ERK activation. Cancer cells frequently 
exhibit an increased dependence on glycolysis, and therefore, 
the discovery of the lactate-NDRG3-Raf-ERK axis and its role 
in angiogenesis and hypoxic cell growth may provide an impor- 
tant explanation forthe growth advantage offered by a glycolytic 
phenotype to cancers. In this regard, lactate might be consid- 
ered an oncometabolite that drives the progression of solid 
tumors as an alternative fuel, an agent modulating the tumor 
microenvironment, and a signaling molecule. 

Many characteristics of hypoxia responses are also exploited 
by diseased cells (Cassavaugh and Lounsbury, 2011). The pres- 
ence of hypoxia is correlated with poor patient prognosis and 
poor treatment outcome in cancers (Jubb et al., 2010; Semenza, 
2004), and therefore, hypoxia has been an important target for 
cancer therapy. Although HIF is the prime target in this regard, 
concerns have been raised that the simple inhibition of HIF 
may not be enough to prevent the progression of hypoxia- 
induced diseases, since many studies indicated that compensa- 
tory, HIF-independent pathways can be induced when a single 
factor is inhibited (Mizukami et al., 2005, 2007; Carmeliet et al., 
1998; Rapisarda et al., 2009; see Introduction for supporting ex- 
amples). These observations collectively led to the suggestion 
that the most successful anti-hypoxia strategy may require a 
combination of agents inhibiting HIF-independent as well as 
HIF-dependent pathways (Mizukami et al., 2007; Fong, 2008). 
Despite the likelihood of functional coupling with HIF-1 a, 
NDRG3 seems to have distinct functions in hypoxia response 
regulation as indicated by gene set enrichment analysis of the 
transcriptome data for NDRG3- and /-//F7/A-depleted cells during 



(E) Interaction of different forms of NDRG3 proteins with PHD2 or VHL in HEK293T cells at different oxygen conditions. WT, wild-type. Quantified values for 
western blot images are shown on the right. Results are mean ± SD of three experiments. The p value was assessed by Student’s t test. 

(F) Ubiquitination assay of variant NDRG3 proteins from HEK293T cells grown in normoxia or under hypoxia. 

(G) Effects of inhibiting hypoxic lactate production by oxamate on the interaction of different NDRG3 proteins with PHD2 or VHL in HEK293T cells. Quantified 
values for western blot images are shown on the right. Results are mean ± SD of three experiments. The p value was assessed by Student’s t test. 

See also Figure S5. 
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Figure 6. NDRG3 Is Required for Hypoxia-Induced Raf-ERK1/2 Activation 

(A) Effects oi NDRG3 knockdown on hypoxia-induced ERK1/2 activation in SK-Hep-1 cells. 

(B) Effects of NDRG3 knockdown on hypoxia-induced RAF activation in SK-Hep-1 cells. 

(C) Activation of ERK1/2 and c-Raf by NDRG3 in normoxia in HEK293T cells. 

(D) Suppression of the hypoxic phosphorylation of c-Raf and ERK1/2 by NDRG3 knockdown and its rescue by ectopic expression of NDRG3. 

(E) In vitro phosphorylation of recombinant c-Raf protein by the NDRG3-containing complex immunoprecipitated from HEK293T cells. 

(F) Interaction profile between endogenous NDRG3 and c-Raf proteins during progression of hypoxia. Phosphorylation profiles of c-Raf and ERKare also shown. 

(G) Lactate dependence of the NDRG3-mediated c-Raf-ERK activation during hypoxia in Huh-1 cells. 

(H) Suppression of the hypoxic phosphorylation of c-Raf and ERK1/2 by glucose deprivation and its rescue by NDRG3 overexpression. 

See also Figure S6. 



hypoxia. Therefore, these observations, along with the roles of 
NDRG3 in hypoxia responses as shown in this study, suggest 
that combinatorial targeting of HIF and NDRG3 might prove 
highly effective in cancer therapy. Abrogation of tumor growth 
when NDRG3 was depleted in combination with either HIFs sup- 
ports the feasibility of this strategy (Figure 3FI). 



In conclusion, NDRG3 provides a crucial genetic evidence 
for the oxygen-dependent regulation of HIF-independent hypoxia 
signaling. The regulation and functions of NDRG3 in hypoxia imply 
that the PHD2A/HL system can control both HIF-dependent 
and HIF-independent hypoxia responses in an oxygen-depen- 
dent manner. Therefore, the lactate-NDRG3-Raf-ERK signaling 
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Figure 7. Lactate-Induced Cell Growth and Angiogenesis Depend on NDRG3 Expression and ERK1/2 Activity 

(A) Dependence of the hypoxic growth of Huh-1 cells on lactate, NDRG3, and ERK1/2 activity. 

(B) Dependence of the hypoxia-induced angiogenic activity of Huh-1 cells on lactate, NDRG3, and ERK1/2 activity. 

(C) Upregulation of c-Raf-ERK1/2 phosphorylation and pro-angiogenic gene expression in tumor xenografts formed by Huh-1 cells overexpressing NDRG3. 

(D) Immunohistochemical analysis of NDRG3 and phospho-ERK1/2 expression in human liver cancers. Relationship between NDRG3 protein and phospho- 
ERK1/2 expression was assessed by test. 

(E) A scheme outlining the regulatory mechanism for prolonged hypoxia responses involving lactate and NDRG3. 

See also Figure S7. 
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pathway may provide an extended mechanistic clue to the under- 
standing of disorders caused by mutations in VHL (hemangioblas- 
toma, renal cell carcinoma, pheochromocytoma, etc.) (Maher 
et al., 2011) or PHD2 (familial erythrocytosis-3) (Percy et al., 
2006) as well as the hypoxia-related physiological and pathophys- 
iological responses (Cassavaugh and Lounsbury, 2011). 

EXPERIMENTAL PROCEDURES 
Cell Lines 

Human cell lines, PLC/PRF/5, SK-HEP-1, MCF-10A, MCF-7, IMR-90, HeLa, 
SW480, and HEK293T were purchased from American Type Culture Collec- 
tion. Two human hepatoma cell lines, Huh-1 and Huh-7, were obtained from 
Japanese Cancer Research Resources Bank. Cells were cultured under stan- 
dard conditions (see Extended Experimental Procedures). 

Identification of PHD2-Binding Proteins 

In order to identify PHD2-binding proteins, we carried out mass spectrometric 
analysis of the proteins immunoprecipitated from MCF-7 cells cultured under 
hypoxia for 24 hr in the presence of the proteasome inhibitor, MG132. We 
avoided the yeast two-hybrid screen as it has known technical limitations for 
some types of proteins. A detailed method is described in Extended Experi- 
mental Procedures under the subtitle Micro-LC-MS/MS Analysis and Protein 
Database Search. 

RNA Interference 

We used commercial pooled siRNA products (SMARTpool, Dharmacon) for 
transient knockdown of NDRG3, NDRG1 , HIF-1A, EPAS1 {HIF-2A), ARNT 
{HIF-1B), VHL, LDHA, MCT1, and MCT4. Otherwise, siRNAs were synthesized 
from Samchullypharm (Korea). The sequences of siRNAs are listed in Table S1 . 

Protein Structure Modeling and Docking Simuiation 

Prediction of NDRG3 protein structure was achieved using Modeler 9v10 (Es- 
war et al., 2008). Protein-protein and protein-ligand docking simulations were 
performed using the HEX6.3 program (Ritchie and Kemp, 2000) and the Auto- 
dock Vina software (http://vina.scripps.edu/index.html), respectively. Detailed 
methods are described in Extended Experimental Procedures. 

L-Lactate Measurement and Binding Assay 

L-Lactate production was measured using the EnzyChrom L-Lactate Assay kit 
(BioAssay Systems). The protocol for the analysis of interaction between re- 
combinant NDRG3-GST protein and i-f'^CJ-lactate (PerkinElmer) is described 
in Extended Experimental Procedures. 

Statisticai Anaiysis 

Statistical significance of the data was mostly assessed by using the Student’s 
t test except for the tissue microarray data for which the test was used. 

Misceilaneous Methods 

Virus-mediated gene expression, immunoprecipitation and western blotting, 
RT-PCR, site-directed mutagenesis, expression and purification of recombi- 
nant proteins, production of anti-NDRG3 antibody, ubiquitination assays, 
gene expression profiling, cell growth assays, apoptosis assay, in vitro kinase 
assay, tumorigenesis in a mouse model, in vivo angiogenesis assay, immuno- 
fluorescence microscopy, and tissue microarray analysis are described in 
Extended Experimental Procedures. Contents dealing with human and animal 
subjects were approved by the Institutional Review Board of Inje University 
Seoul Paik Hospital (Seoul, Korea) and KRIBB, respectively. Antibodies and 
primer sequences used for RT-PCR analyses and site-directed mutagenesis 
are listed in Tables S2, S3, and S4. 

ACCESSION NUMBERS 

The GEO accession number for the microarray data reported in this paper is 
GSE55214. 



SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and four tables and can be found with this article online at http://dx.doi. 
org/1 0. 1 01 6/j.cell.201 5.03.01 1 . 
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In Brief 

Neuronai swelling, the major cause of 
death in traumatic and ischemic brain 
injuries, is initiated when aberrant entry of 
sodium ions and depolarization activates 
the voltage-gated chloride channel, 
SLC26A1 1 . The increase of cytoplasmic 
sodium and chloride causes an osmotic 
imbalance that leads to water entry and 
cytotoxic edema, a mechanism that could 
be targeted to prevent and treat brain 
edema. 



Highlights 

• Neuronal swelling depends on Na*^ and Cl~ influx but is 
independent of Ca^* influx 

• Neuronal swelling after Na"^ and Cl influx causes Ca^^- 
independent neuronal death 

• Knockdown of the ion exchanger SLC26A1 1 attenuates 
neuronal swelling 

• SLC26A1 1 -dependent Cl influx occurs via voltage-gated 
Cl channel activity 
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SUMMARY 

Cytotoxic brain edema triggered by neuronal 
swelling is the chief cause of mortality following brain 
trauma and cerebral infarct. Using fluorescence life- 
time imaging to analyze contributions of intracellular 
ionic changes in brain slices, we find that intense Na"^ 
entry triggers a secondary increase in intracellular 
Cr that is required for neuronal swelling and death. 
Pharmacological and siRNA-mediated knockdown 
screening identified the ion exchanger SLC26A11 
unexpectedly acting as a voltage-gated C\~ channel 
that is activated upon neuronal depolarization to 
membrane potentials lower than -20 mV. Blockade 
of SLC26A11 activity attenuates both neuronal 
swelling and cell death. Therefore cytotoxic neuronal 
edema occurs when sufficient Na"^ influx and depo- 
larization is followed by C\~ entry via SLC26A11. 
The resultant NaCI accumulation causes subse- 
quent neuronal swelling leading to neuronal death. 
These findings shed light on unique elements of 
volume control in excitable cells and lay the ground 
for the development of specific treatments for brain 
edema. 

INTRODUCTION 

Brain edema, the pathological hallmark of excitotoxic injury and 
traumatic brain injury (Donkin and Vink, 2010; Klatzo, 1987; Mar- 
marou et al., 2006; Rosenblum, 2007) was first characterized by 
Klatzo (1967) as either vasogenic or cytotoxic. Cytotoxic brain 
edema is caused by water movement into the intracellular 
compartment of neurons and/or astrocytes leading to brain 
swelling, while vasogenic edema is due to water entry into the 
brain from the vasculature (Klatzo, 1967). Excitotoxic swelling 
of cultured neurons is known to involve influx of both Na^ and 
Cl“, although the influx pathway(s) for Cl“ remain obscure 
(Choi, 1987; Hasbani et al., 1998; Rothman, 1985). The low 
resting Cl“ permeability in neurons suggests that a Cl“ channel 
or exchange mechanism must be activated for Cl“ entry to 
occur at sufficient levels to increase cell volume and cause cyto- 
toxic edema. In mature pyramidal neurons of the cortex and hip- 
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pocampus, the equilibrium potential for C\~ (EcD is set more 
hyperpolarized to the resting membrane potential (Em) by 
KCC2-mediated active transport of C\~ out of the cell against 
its electrochemical concentration gradient (Blaesse et al., 
2009). As such, the C\~ influx required for cytotoxic neuronal 
edema occurs as a result of either the activation of a C\~ channel 
that is not open at rest, or activation of a C\~ transporter. 
Putative candidates for C\~ loading leading to swelling are the 
volume-regulated anion channel (VRAC), the Na^-K'^-Cr co- 
transporter 1 (NKCC1) and GABA-activated C\~ channels (Allen 
et al., 2004; Hasbani et al., 1998; Inoue et al., 2005; Pond et al., 
2006). In addition, there are several newly described C\~ 
channels and transporters that could also be important contrib- 
utors to neuronal edema. Our experiments were designed 
to examine the interrelationship between neuronal volume, 
intracellular Na"^ concentration ([Na^j) and intracellular C\~ con- 
centration ([Cr]i) in order to investigate the roles for C\~ entry 
pathways that contribute to neuronal swelling leading to cell 
death. 

Neuronal swelling occurs as a result of multiple depolarizing 
triggers that increase [Na^j including excessive glutamate re- 
ceptor activation, intense neuronal spiking, activation of non- 
selective cation channels, and inhibition of NaVK'^-ATPase 
(Liang et al., 2007). We tested the impact of increasing [Na^j 
via ligand- or voltage-gated ion channels on neuronal swelling 
to test the hypothesis that extensive Na'^ influx itself, indepen- 
dent of the route of entry, leads to swelling by triggering C\~ 
influx. Two-photon imaging of cell morphology and fluorescence 
lifetime measurements (FLIM) of [Na^j and [Crji in hippocampal 
and cortical neurons in acutely prepared brain slices were com- 
bined to specifically examine the relationship between increased 
[NaTi, subsequent [C\~]i changes, and neuronal swelling. The 
cytotoxic nature of this swelling was measured by lactate dehy- 
drogenase (LDH) efflux (e.g., Kajta et al., 2005). Pharmacological 
blockers of known C\~ channels and exchangers were further 
examined in order to determine the relative contribution of 
different C\~ loading pathways to neuronal swelling. Finally, a 
lipid nanoparticle (LNP) strategy to introduce siRNA into neurons 
in vivo (Rungta et al., 201 3) was employed to determine the exact 
cr pathway critical and required for the majority of neuronal 
swelling. The results indicate that a significant proportion 
of neuronal swelling and subsequent cell death requires 
SLC26A11, a protein that can act as a Cr, HCOs", 804“ 
exchanger or a Cr channel in expression systems and recently 
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reported to be highly expressed in cortical and hippocampal 
neurons (Rahmati et al., 2013). The identification of the principal 
pathway required for Cl“ entry could potentially lead to novel tar- 
gets and therapies for treating cytotoxic brain edema. 

RESULTS 

Increased Intracellular Sodium Triggers Neuronal 
Swelling 

We first investigated whether increasing [Na^j was itself 
capable of triggering a cascade leading to an increase in cell 
volume and second whether this cascade also leads to rapid 
cell death. Two parallel and independent approaches were 
taken to increase [Na^j by either applying veratridine, which re- 
moves inactivation of voltage-gated sodium channels (VGSCs) 
(Strichartz et al., 1987) prolonging Na^ entry or by applying 
NMDA to activate NMDA receptors (NMDARs). NMDA activates 
a non-selective cation conductance leading to entry of Na"^ and 
also Ca^"^. Neuronal Na"^ entry was induced under conditions in 
which other voltage-gated ion channels and ligand-gated trans- 
mitter receptors were blocked by a combination of Cd^'^ 
(30 |iM), CNQX (20 |iM) and picrotoxin (100 \iM). Either veratri- 
dine or NMDA was rapidly applied by pressure ejection from a 
pipette positioned directly above the region of the brain slice 
that was imaged. To ensure the selectivity of either approach 
veratridine was applied with d-APV (100 |iM) to block NMDARs 
and NMDA was applied with TTX (1 |iM) to block VGSCs. 
Changes in [Na^j were monitored using the fluorescent Na"^ in- 
dicator CoroNa-Green (Meier et al., 2006), which preferentially 
stains hippocampal and cortical neurons in brain slices (Fig- 
ure 1A). Astrocytes, which did not show any obvious volume 
changes under these experimental manipulations, were visual- 
ized using Sulforhodamine 101 (SRI 01) (Nimmerjahn et al., 
2004) to provide landmarks to track during swelling of the tissue 
(red cells in Figures 1 A and 1 B). The activation of either VGSCs 
by veratridine or NMDARs by NMDA consistently led to a signif- 
icant increase in [Na^j followed, after a delay of seconds, by an 
increase in neuronal cell volume (Figures 1 B-1 D, 1J, 1 K, 2A, and 
2B and Movie SI). We further compared the impact of Ca^"^ 
versus Na'^ entry through NMDARs on swelling by repeating 
experiments in Ca^"^ or Na"^ free extracellular solutions. The in- 
crease in cell volume from NMDAR activation was still observed 
in extracellular Ca^'^ free solution (cross sectional area 
increased to 161.60% ± 10.55% of baseline). Flowever, in the 
presence of low concentration of extracellular Na"" ([Na^ext) 
and normal Ca^'^, swelling was completely absent and NMDAR 
activation actually resulted in a decrease in neuronal volume 
(Figures 1J, 2C, and 2D). Control experiments showed that 
neuronal [Na'^jj increases and swelling induced by veratridine 
were blocked by the VGSC antagonist, TTX (Figures 1J and 
IK; p < 0.001, two-tailed Student’s t test) and those induced 
by NMDA were blocked by the NMDAR antagonist, d-APV (Fig- 
ures 1J and IK; p < 0.001, ANOVA). Our experimental assay 
was performed at room temperature to facilitate the imaging 
of AM indicator dyes which are more rapidly extruded from neu- 
rons at 37°C (Beierlein et al., 2004) (Figure SI). Flowever, as the 
function of many transporters and metabolic proteins that 
govern ion transport are temperature-dependent, we confirmed 



that increases in [Na^i equally cause swelling of neurons at 
37°C (Figure SI). 

Although an increase in Na"^ preceding swelling was consis- 
tently observed, the magnitude and duration of CoroNa fluores- 
cence signals were distorted during cellular swelling due to 
dye dilution. This is consistent with our observations that 
swelling was associated with reduced fluorescence intensity 
of the inert dye, Calcein red-AM (Figure S2). However, without 
the ability to dissociate changes in [Na^j from changes in dye 
concentration, it is not possible to conclude that [Na^j itself 
is not also decreasing during swelling. In order to define the 
true magnitude and time course of the [Na^j increases, we 
developed a method to record real-time calibrated measure- 
ments of [Na^i using two-photon FLIM which was independent 
of changes in dye concentrations. When lifetime measurements 
of CoroNa were first tested in iso-osmotic salt solutions the 
time constant of decay (t) increased with increasing [Na^ 
(Figure 1G). However, as the local environment can affect 
lifetime measurements of dyes (Berezin and Achilefu, 2010), 
calibrations of CoroNa lifetimes were obtained within the cyto- 
plasm of neurons by whole-cell voltage-clamping of neurons 
and dialysis with different [Na^ concentrations. CoroNa life- 
times were best fit using a biexponential decay (Figure S3) 
with a short lifetime (jfast) predictive of [Na^j (Figures 1H 
and II). FLIM of CoroNa loaded neurons revealed that [Na^j 
increased to approximately 94.46 ± 2.14 mM (calibrated value) 
throughout veratridine application and gradually recovered 
after washout (Figures 1 E and 1 F and Movie S2). These results 
demonstrate that the decrease in CoroNa fluorescence as the 
neurons swell is primarily due to dye dilution and not a dilution 
of [Na^i itself. 

Cl~ Influx Is Required for Na'^-lnduced Neuronal Swelling 

Since cytoplasmic impermeant anions make up the bulk of the 
intracellular anionic milieu, changes in [C\~]\ must be met by 
an accompanying influx of water, possibly via transporters 
(Zeuthen, 201 0), in an attempt to achieve Gibbs-Donnan equilib- 
rium (Glykys et al., 2014). We therefore examined whether pro- 
longed [Na^i increases were associated with a secondary influx 
of Cr, and further whether C\~ entry was ultimately required for 
neuronal swelling. Using two-photon FLIM of the CP-sensitive 
dye MQAE (Ferrini et al., 2013; Verkman et al., 1989), we 
observed that [Cr]j increased in neurons (indicated by a 
decrease in the fluorescence lifetime) when Na'^ influx was trig- 
gered by veratridine application (Figures 3A and 3B). This C\~ 
influx was independent of entry via GABAaRs as all experiments 
were performed in the presence of the ligand-gated Cr channel 
antagonist, picrotoxin (100 |iM). 

Whether neuronal Na^ and subsequent C\~ influx was suffi- 
cient to increase tissue volume were next investigated by 
imaging hippocampal/cortical brain slices at low magnification. 
Application of veratridine triggered dramatic swelling of brain 
slices that was reduced but still substantial even when a number 
of Na^, Ca^^, and C\~ entry pathways were reduced by blockade 
of glutamate-gated AMPARs and NMDARs, voltage-gated Ca^'^ 
channels (VGCCs), and GABA activated C\~ channels with a 
cocktail of blockers (20 |iM CNQX, 100 |iM d-APV, 30 |iM 
Cd^"^, and 100 |iM picrotoxin) (Figures 3C and 3D and Movie 
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Figure 1. Neuronal Swelling Is Caused by Prolonged Increases in Intracellular Na~^ and Is Independent of Ca^~^ 

(A) CoroNa Green (Na^ indicator) ioaded neurons versus SR101 stained astrocytes (red) in a hippocampai brain siice imaged using two-photon iaser scanning 
microscopy. 

(B-D) Corticai neurons treated with veratridine (50 laM) show increase in [Na^j foiiowed by sweiiing (increase in cross sectionai area). Astrocytes do not sweii. 
(E and F) CoroNa FLiM measurements of [Na^i as neurons sweii reveais true time course and magnitude of Na^ signais that are independent of dye concentration 
(n = 4). 

(G-i) Caiibration of FLiM measurements of neuronai [Na^i with CoroNa. (G) Decay of CoroNa fluorescence changes in salt solutions with varying [Na^. (H) Dual 
(simultaneous) whole-cell patch clamping of two neurons dialyzed with high (109 mM) and low (9 mM) [Na^i show distinct separation of lifetimes. (I) Calibration of 

(legend continued on next page) 
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S3). In contrast, blocking all C\~ influx pathways by reducing the 
concentration of extracellular C\~ ([Crjext) with iso-osmotic 
replacement of NaCI for Na-gluconate in the extracellular solu- 
tion dramatically reduced the magnitude of the volume increase 
of brain slices (Figure 3D; p < 0.001 , ANOVA). These results sug- 
gest that even when fast ionotropic glutamate and GABA acti- 
vated receptors are blocked, increased neuronal [Na^j leads to 
cytotoxic edema of brain tissue that is dependent on C\~ influx. 
We next tested whether reducing [Cr]ext also prevented Na^- 
induced swelling of individual neurons. Indeed, reducing [Cr]ext 
reduced the swelling of neurons visualized with CoroNa fluores- 
cence (Figures 3E and 3F and Movie S4; p < 0.001 , ANOVA), 
without affecting the [Na^j signal (Figure 3H; p > 0.05, two-tailed 
Student’s t test). As it has been previously reported that 
GABAAR-mediated C\~ influx can contribute to both neuronal 
swelling in cell culture (Hasbani et al., 1998) and to swelling 
following oxygen glucose deprivation in situ (Allen et al., 2004), 
the contribution of GABAaR C\~ influx to neuronal swelling in 
our experimental conditions was examined. Consistent with pre- 
vious reports, pre-application of the GABAaR antagonist picro- 
toxin slightly but significantly reduced the magnitude of neuronal 
swelling (from 161.7% to 146.9%; Figure 3F; p < 0.05, ANOVA); 
however, the majority of the volume increase persisted in 



+NMDA (6:40min) 



Figure 2. NMDAR Activation Triggers 
Neuronal Swelling that Requires Na~^ Influx, 
but That Is Independent of Ca^"^ Influx 

(A and B) Na'^ influx triggers an increase in neuronal 
volume, measured as the cross sectional area in 
the absence of extracellular Ca^"^ (0 mM Ca^^, 
2 mM EGTA) (n = 5). 

(C and D) Iso-osmotic replacement of extracellular 
Na"^ with NMDG (from 152 mM to 26 mM), to 
reduce Na"^ entry through NMDARs prevents 
neurons from swelling and causes them to shrink 
(86.7% of baseline, p < 0.05) (n = 4). 

Scale bars, 15 irm (B and D). Shaded area above 
and below mean represent SEM. 



picrotoxin suggesting that the cause 
of swelling was dominated by C\~ influx 
via an as yet unidentified mecha- 
nism. NMDA-induced swelling was also 
blocked by low [Crjext (iso-osmotic 
replacement of NaCI for Na-isethionate) 
(Figure 3G; p < 0.05, two-tailed Student’s 
t test). Together, these data indicate 
that neuronal swelling requires C\~ influx 
through a mechanism that is triggered by an increase in [Na^j 
and that Na^ entry alone is not sufficient to swell neurons. 

and Cl~ Dependent Neuronal Swelling Causes Death 

Aberrant calcium influx via NMDARs can lead to mitochondrial 
depolarization and cell death; however, C\~ removal also re- 
duces ischemia- and glutamate-evoked early neuronal death 
in cell culture (Choi, 1987; Goldberg and Choi, 1993; Rothman, 
1985), suggesting the existence of two independent pathways 
ultimately leading to cell death. The impact of the [Na^j-trig- 
gered C\~ entry and neuronal swelling on cell viability was further 
investigated using LDH release as a measure of cell death (e.g. 
Kajta et al., 2005). Even in the combined presence of CNQX, 
picrotoxin, and Cd^"^ to block fast AMPA/KA receptors, GABA- 
activated Cr channels, and VGCCs, respectively, application 
(1 5 min) of either veratridine (50 |iM) or NMDA (1 00 |iM, in artificial 
cerebrospinal fluid [ACSF] containing 0 mM Ca^"^ and 2 mM 
EGTA) caused a rapid and significant increase in LDH release, 
indicating neurons were dying after 90 min (Figures 31 and 3J; 
p < 0.01, ANOVA). Both the NMDA-induced and veratridine- 
induced neuronal death, as indicated by LDH release, were abol- 
ished by reducing [Crjext throughout the experiment (Figures 31 
and 3J; p < 0.01, ANOVA). This suggests that Na'^-induced Cr 



CoroNa lifetimes measured in soma of neurons dialyzed with different [Na^ shows that the [Na^i can be predicted from Xfast- Calibrated values for each [Na^ were 
obtained from n > 3 voltage clamped neurons. 

(J and K) Quantified data show neuronal swelling is triggered by sodium influx via independent pathways. NMDAR-mediated swelling was dependent on Na^ 
influx and independent of Ca^'^. Control confirms Na"^ signal and swelling caused by veratridine and NMDA was via VGSCs and NMDARs respectively, as they 
were blocked by antagonists, TTX (1 |tM) and d-APV (100 |tM). 

All experiments were done in the presence of 30 |iM Cd^"^, 20 |iM CNQX, 100 |tM picrotoxin. Additionally, neurons were pretreated with 100 |tM d-APV (NMDAR 
antagonist) for veratridine experiments and 1 |tM TTX (VGSC antagonist) for NMDA experiments to confirm pathways were independent. Scale bars, 20 |rm 
(B) and 15 |rm (H). VER, veratridine; x-sectional, cross sectional; VGSC, voltage-gated sodium channel; SRI 01, sulforhodamine 101. Control values in 
(J) and (K) are also re-plotted in Figures 3 and 4. Error bars and shaded region above and below the mean represent SEM. See also Figure SI , S2, and S3 and 
Movie SI, S2, and S3. 
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Figure 3. Influx Is Correlated with a Secondary Cl~ Influx That Is Required for Neuronal Swelling and Causes Cell Death 

(A and B) FLIM of Cr-sensitive dye, MQAE, shows that Cr influx is correlated with increases in [Na^j (n = 5). 

(C and D) Neuronal Na"^ influx triggers an increase in brain tissue volume shown by changes in volume of a hippocampal brain slice. (D) Cocktail of fast glutamate 
receptor, GABA receptor and VGCC blockers slightly reduce tissue swelling (p < 0.01) but significant Cl“ dependent swelling still occurs (p < 0.01) indicating that 
swelling is dominated by other mechanisms. 

(E and F) Veratridine triggered neuronal swelling is prevented by reducing extracellular Cl“ (10.5 mM) and is only partially inhibited by blocking GABAaRs. 

(G) NM DA triggered swelling is blocked by reducing extracellular Cl“. 

(H) Positive control shows veratridine and NMDAR Na"^ signals were unaffected by low Cl“ solution. 

(legend continued on next page) 
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Figure 4. Neuronal Swelling Reflects the Pharmacological Profile of 
a SLC4 or SLC26 Family Member 

(A) Veratridine-induced neuronal swelling was blocked by the HCOs" /C\~ 
exchanger inhibitor, DIDS (250 laM) but not by blockers of several other 
Crchannels or transporters (see Table S1). 

(B) Positive control shows veratridine and NMDA-induced Na^ signal in the 
presence of DIDS. 

(C) NMDA-induced neuronal swelling was blocked by DIDS in a dose depen- 
dent manner; control (n = 5), 250 laM (n = 4), 500 laM (n = 5), 1 mM (n = 5). All 
solutions contained blockers: 30 |iM Cd^^, 20 |iM CNQX, 100 |iM picrotoxin, 
plus either 100 ^iM d-APV for veratridine experiments or 1 |iM TTX for NMDA 
experiments. VER, veratridine. Error bars and shaded region above and below 
the mean represent SEM. 

influx and subsequent swelling results in Ca^'^-independent cell 
death. 

Pharmacological Analyses of the Predominant Cr Influx 
Pathway Required for Neuronal Swelling and Death 

There are several candidates for the transmembrane influx of C\~ 
in neurons that can be distinguished based on their sensitivity to 



different antagonists (Alvarez-Leefmans and Delpire, 2009; 
Jentsch et al., 2002; Verkman and Galietta, 2009) (Table S1). 
We hypothesized that by identifying and blocking the source of 
Cl“ entry that was triggered by Na^ entry, both the Na'^-induced 
neuronal swelling and corresponding cell death could be pre- 
vented. As a first step, pharmacological analyses using the imag- 
ing assay of swelling of neurons in brain slices were undertaken 
in order to screen for the possible involvement of different Cl“ 
channels and transporters. In separate experiments the 
following blockers were tested as described in Table S1 ; NPPB 
(200 |iM) to block the volume-regulated anion channel (VRAC, 
VSOR), zinc (300 nM) to block CLC-2, (100 nM) to block 
the Maxi-anion channel, niflumic acid (NFA) (200 |iM) to block 
the Ca^"" activated Cl“ conductance (CaCC, bestrophin), carbe- 
noxelone (CBX) (100 |iM) to block pannexins/connexins, bume- 
tanide (100 |iM) to block cation chloride cotransporters 
(NKCC1 and KCC2), and DIDS (250 |iM) to block SLC4 and 
SLC26 anion exchangers. All antagonists were both bath applied 
and present in the puffing pipette used to apply either NMDA or 
veratridine. Of note, of the various C\~ channel and transporter 
blockers examined, only DIDS reduced the swelling induced 
by increased [Na^j (Figure 4A; p < 0.05 compared to all other 
antagonists, ANOVA). The small volume change in the presence 
of DIDS was not significantly different from those observed in low 
or extracellular solution (Figure 4A; p > 0.05, ANOVA). A sub- 
stantial [Na^i increase was still observed in DIDS indicating 
that Na"^ entry was not affected (Figure 4B). This pattern of block 
by DIDS but no effect of the numerous other blockers suggested 
that a member of the SLC4 or SLC26 families of anion ex- 
changers was the most likely source of C\~ entry. Although 
DIDS also blocks VRAC, which has been implicated in excito- 
toxic cell death in neuronal cell culture (Inoue and Okada, 
2007), under our conditions we observed no protection of either 
cell volume or cell death in the presence of the potent VRAC 
blocker, NPPB. DIDS also blocked NMDA-evoked neuronal 
swelling in a dose-dependent manner (Figure 40) and was 
confirmed to block the veratridine-stimulated swelling at 37°C 
(Figure SI), suggesting a common mechanism. 

As it was observed that extracellular C\~ was required for both 
neuronal swelling and the subsequent cell death and that DIDS 
prevented neuronal swelling, we predicted that DIDS would 
block the C\~ dependent cell death pathway without affecting 
the classic Ca^'^-dependent death. DIDS was initially tested for 
its effectiveness in preventing the swelling-induced, Cr-depen- 
dent cell death as measured by LDH efflux in brain slices 
exposed to veratridine. Indeed, DIDS prevented cell death 
from veratridine-induced Na"^ influx and swelling (Figure 5A; 
p < 0.005, ANOVA), whereas the VRAC blocker NPPB had 
no effect. DIDS was further examined on both the NMDA 



(I) Neuronal Na^ influx via VGSCs causes cell death that is CP-dependent as measured by LDH release. 

(J) Neuronal Na'^ influx via NMDARs causes cell death that is CP-dependent and Ca^'^-independent. Slices were incubated in low [CP]o or control ACSF for the 
entire experiment starting 20 min. prior to either Veratridine or NMDA (1 5 min.). LDH was collected from supernatant 1 .5 hr following end of Veratridine or NMDA 
treatment. 

Scale bars, 1 0 i^m (A), 1 .0 mm (C), 1 5 ^im (E). For experiments in (A, B, E and G-J) solutions contained blockers: 30 ^iM Cd^^, 20 ^iM CNQX, 1 00 ^iM picrotoxin, plus 
either 100 |xM d-APV for veratridine experiments or 1 |iM TTX for NMDA experiments, n values in (F), blockers (n = 5), +picrotoxin (n = 13), low Cl“ (n = 5). VER, 
veratridine; VGCC, voltage-gated calcium channel; VGSC, voltage-gated sodium channel. Error bars and shaded region above and below the mean represent 
SEM. See also Movie S3 and Movie S4. 
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Cr-dependent, Ca^'^-independent cell death pathway and on 
the NMDA Ca^'^-dependent cell death pathway. As predicted, 
□IDS blocked the cell death caused by NMDA in Ca^^ free extra- 
cellular solution (Figure 5B; p < 0.005, ANOVA). If however, 
NMDA was applied in the presence of extracellular Ca^"^ but 
reduced extracellular Na^, cell death still occurred (Figure 5C; 
p < 0.005, ANOVA) but was not blocked by DIDS (Figure 5C; 
p > 0.05, ANOVA). These results suggest that two independent 
cell death pathways co-exist that can be distinguished based 
on their ionic basis; one that involves swelling, requires Na^ 
and or influx, is Ca^^ independent, and is blocked by DIDS, 
and one that is triggered by Ca^"^ influx, but that is not DIDS 
sensitive. 

Identification of SLC26A1 1 as the Predominant Cl~ 

Infiux Pathway Underlying Na*^ Dependent Cytotoxic 
Neuronal Swelling 

Our data indicate that Na"^ entry into neurons is linked to a DIDS- 
sensitive C\~ influx pathway that is required for neuronal swelling 
and mediates cell death. Several DIDS-sensitive candidates are 
expressed in ONS neurons of which several act as Or/FIOOs” 
exchangers and include the SL04 family of exchangers (Al- 
varez- Leef mans and Delpire, 2009; Boron et al., 2009; Romero 
et al., 2013). The DIDS-sensitive Or, HOOs” exchangers that 
are known to be expressed in the cortex and hippocampus are 
SL04A3, SL04A8, and SLO4A10 (Boron et al., 2009; Romero 
et al., 2013). In addition, SLC26A11 was recently shown to be 
highly expressed in CNS cortical neurons (Rahmati et al., 
2013). SLC26A1 1 is a member of the sulfate transporter family 
that in different expression systems has been reported to act 
variously as a DIDS-sensitive sulfate transporter, a DIDS-sensi- 
tive exchanger for Cr, SO/“, HCOs”, or H^-Cr or as a C\~ 
channel (Lee et al., 2012; Rahmati et al., 2013; Vincourt et al., 
2003; Xu et al.,2011). 

Utilizing qRT-PCR, the expression of SLC4 and SLC26 family 
members was confirmed in both cortical and hippocampal brain 
tissue (Figure S4). Based on their combined pharmacological 
profile and expression profiles, SLC4-A3, -A8, -A10, and 



Figure 5. DIDS Blocks Na^ and Cl Depen- 
dent, Ca^^ Independent Cell Death 

(A) LDH release measurements show Na^- and 
cr-dependent cell death triggered by veratridine 
was blocked by the HCOs'/Cr exchanger 
antagonist, DIDS but not by the VRAC blocker 
NPPB. 

(B) NMDAR Na^ influx triggers cell death in the 
absence of extracellular Ca^^ that is blocked by 
DIDS but not NPPB. 

(C) NMDAR Ca^"^ influx also triggers cell death that 
is not blocked by DIDS, indicating separate path- 
ways. Error bars above and below the mean 
represent SEM. 



SLC26A11 appeared to be the most 
promising candidates for the C\~ entry 
pathway that causes neuronal swelling. 
We recently reported the development 
of an efficient LNP-mediated delivery sys- 
tem to introduce siRNAs against specific molecular targets into 
CNS neurons both in vivo and in vitro (Rungta et al., 2013). Indi- 
vidual siRNAs targeted against the different SLC candidate 
genes were encapsulated in Dil labeled LNPs and initially tested 
for their ability to attenuate expression in both primary neuron 
cultures and a HEK cell expression system (Figure S4. These 
in vitro-validated siRNA LNPs against the 4 different SLC candi- 
dates or a control (luciferase) siRNA were subsequently injected 
intracranially into the rat somatosensory cortex. After allowing 
5-6 days for uptake of LNPs and knockdown of candidate pro- 
teins to occur, neurons that had taken up Dil labeled LNPs 
were examined for Na^-induced CP-dependent swelling in 
cortical slices. Knockdown of SLC4A-3, -8, or -10 either sepa- 
rately or together had no significant effect on the magnitude of 
veratridine-induced neuronal swelling compared to the control 
luciferase siRNA injected animals (Figures 6C and 6G and 
S5; p > 0.05, ANOVA). In striking contrast, knockdown of 
SLC26A1 1 with two siRNAs targeted toward different sequences 
of SLC26A1 1 mRNA, significantly reduced the magnitude of the 
swelling in neurons (Figures 6D and 6H and Movie S5; p < 0.05, 
ANOVA was performed comparing results from all siRNA groups 
(luciferase, A3, A8, A10, A3-hA8-hA10, All No.1 and All No.2)). 
The occurrence of SLC26A1 1 knockdown was further validated 
by western blot analysis of SLC26A1 1 protein in tissue 5 days 
following injection of SLC26A1 1 siRNA-LNPs (Figures 6A and 
6B). These results indicate that the C\~ influx that is required 
for neuronal swelling is mediated by a SLC26A1 1 -dependent 
process. 

Studies of the properties of recombinant SLC26A11 have 
shown that, depending upon the cell type in which it is ex- 
pressed, this protein can act either as a Cr channel or a 
S 04 ^“ or oxalate transporter that is inhibited by DIDS or the 
CFTR antagonist GlyH-101 (Alper and Sharma, 2013; Rahmati 
et al., 2013; Stewart et al., 2011). We therefore investigated 
whether GlyH-101 has similar actions on preventing neuronal 
swelling and the associated cell death and whether there exists 
a neuronal Cr current that is sensitive to both DIDS and GlyH- 
101. Similar to the actions of DIDS, GlyH-101 profoundly 
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inhibited both veratridine-stimulated swelling (Figure 6E; p < 
0.001, two-tailed Student’s t test) and cell death (Figure 6F; 
p < 0.001 , ANOVA). 

The opening of Na^ permeable channels causes both [Na^j 
accumulation and neuronal depolarization. The large (~80 mM) 
increases in [Na^j occurred prior to the increases in cell volume 
(Figure 1) suggesting that there are compensatory mechanisms 
such as efflux that are initially sufficient to maintain osmotic 
equilibrium. Decreased intracellular and progressive accumu- 
lation of extracellular could also contribute to further depolar- 
ization of the membrane. We therefore tested the possibility that 
SLC26A1 1 in cortical neurons is required for a DIDS- and GlyFI- 
1 01 -sensitive Cl“ channel that is opened by depolarization. Such 
outwardly rectifying, non-inactivating DIDS-sensitive conduc- 
tances have previously been described in neurons (Smith 
et al., 1995), although their molecular identity remains unknown. 
Whole-cell voltage clamp recordings were obtained under con- 
ditions to reveal voltage-dependent Cl“ currents by blocking 
other known voltage-gated channels with a cocktail of blockers. 
We targeted layer 4 neurons in cortical slices (Figure 7A), the 
same cell types that were also imaged in the swelling studies 
described above. Depolarization to -20 mV or greater elicited 
a non-inactivating Cl“ current that was blocked by DIDS and 
was not present when external [C\~] was reduced (Figures 7C- 
7E; p < 0.001, ANOVA). In addition, dialysis of neurons with 
GlyH-101 at concentrations that prevented neuronal swelling 
were found to also inhibit the voltage-dependent C\~ current 
and occluded the effect of DIDS (Figures 7D and 7E; p < 
0.001, ANOVA). Finally, recordings were made from neurons 
transfected with siRNA against either SLC26A1 1 or luciferase 
(control) using LNPs visualized with Dll. We found that knock- 
down of SLC26A1 1 attenuated the DIDS and GlyH-1 01 -sensitive 
or current (Figures 7C-7E; p < 0.001 , ANOVA), demonstrating 
that SLC26A1 1 protein is a requirement for an outwardly recti- 
fying or current activated in substantially depolarized neurons. 

DISCUSSION 

Our results demonstrate that prolonged Na"^ entry via either 
of two independent pathways (either VGSCs or NMDARs) 
converge to activate a C\~ influx pathway via SLC26A1 1 that is 
ultimately required for neuronal swelling and subsequent cell 
death. Unlike [Na^ whose osmotic influence on the cell can 
initially be met by a compensating efflux in [K^, the anionic intra- 
cellular milieu of the cell is largely made up of large impermeable 
anions. As such, increases in [C\~]\ likely maintain electroneutral- 
ity by retaining Na^ and ions intracellularly, thereby increasing 
intracellular osmolarity and drawing water into the cell. 

In mature pyramidal neurons of the cortex and hippocampus, 
resting membrane potential (EJ is set positive compared to the 
equilibrium potential for C\~ (Ecf) suggesting that C\~ is not 
passively distributed across the plasma membrane (Alvarez- 
Leefmans and Delpire, 2009). Changing membrane potential 
also has little effect on [C\~]\ indicating that there is little C\~ 
membrane permeability at rest (Thompson et al., 1988). As 
such, in order for [C\~]\ to rapidly increase in neurons either a 
Cr transporter has to be activated or a transmembrane C\~ 
channel has to be opened. Membrane depolarization could 



also further contribute to C\~ influx by increasing the driving force 
for cr entry. 

Using an siRNA knockdown approach, we identified the mo- 
lecular nature of the predominant Cr influx pathway that is acti- 
vated following increases in [Na^j and causes neuronal cytotoxic 
edema. Our study demonstrates that SLC26A1 1 acts as a func- 
tional cr influx pathway in neurons. A recent study showed that 
SLC26A1 1 protein is expressed in neurons throughout the brain 
and we would predict that similar mechanisms of swelling and 
neuronal death likely occur in many other areas such as the cer- 
ebellum where expression levels are high (Rahmati et al., 2013). 
SLC26A1 1 , originally identified as a sulfate transporter has been 
shown to operate in several modes, including an exchanger for 
cr SO/“, FICOs", or Fl^-Cr or as a Cr channel, depending 
upon the tissue type and the expression system (Rahmati 
et al., 2013; Vincourt et al., 2003; Xu et al., 2011). The mechanism 
linking Na"^ influx and SLC26A1 1 -mediated Cr influx is most 
simply explained by membrane depolarization activating 
SLC26A1 1 in its Cl“ channel mode, thereby leading to a sus- 
tained cr influx. Our observation that SLC26A11 is required 
for cr channel activity (Rahmati et al., 201 3) that opens with de- 
polarizations greater than -20 mV suggests that Cr would be 
constantly entering the cell as Ec\~ in mature neurons is initially 
set close to -70 mV. During sustained depolarization Eci“ would 
drift to more depolarized potentials therefore Cr influx would 
continue until equilibrium is met or the membrane repolarizes. 
Interestingly, depolarization of cortical neurons with high so- 
lution (40 mM) is not sufficient to cause neuronal swelling alone, 
and only causes swelling when spreading depression occurs, 
concurrent with depolarizations to approximately 0 mV and sub- 
stantial extracellular accumulation (Zhou et al., 2010; Zhou 
et al., 2013). A similar breakdown of ionic gradients occurs dur- 
ing pathological settings of cytotoxic edema, such as ischemia, 
when activation of voltage-gated and ligand-gated channels 
leads to massive increases in [Na^j, followed by increases in 
extracellular and almost complete depolarization of the neu- 
rons (Dreier, 201 1 ; Somjen, 2001). 

Several questions arise as to the specific conditions and times 
that SLC26A1 1 may modulate local and global C\~ concentra- 
tions. Aberrant, C\~ homeostasis is central to several neurolog- 
ical diseases, and it would therefore be interesting to examine 
whether SLC26A11 expression or localization changes under 
such conditions. Epileptic seizures are commonly observed in 
patients following severe traumatic brain injury (TBI) (Annegers 
et al., 1998; Hung and Chen, 2012; Salazar et al., 1985). 
Increased [C\~]\ leading to a depolarizing shift in Eqaba (Cohen 
et al., 2002; Miles et al., 2012) has been reported to contribute 
to the generation of seizure activity. If blocking SLC26A1 1 re- 
duces the increases in C\~ that occur during pathologies that 
are associated with cytotoxic edema, it may be possible to main- 
tain the direction of hyperpolarizing GABAaR currents and 
reduce the generation of post-traumatic seizures. 

In addition to the C\~ loading that occurs during excitotoxic in- 
sults, cr efflux may also be compromised. As KCC2 directional 
transport is dependent on the gradient, small changes in 
extracellular can have substantial effects on KCC2-mediated 
cr clearance. Additionally, a recent study demonstrated that 
glutamate activation of NMDARs leads to phosphorylation and 
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Figure 6. Cl Influx via SLC26A1 1 Causes Cytotoxic Neuronal Edema Following Increased [Na'^]j 

(A and B) Cortical brain tissue tested 5 days foiiowing in vivo injection of LNP encapsuiated siRNAs shows SLC26A1 1 No.1 siRNA seiectiveiy reduced SLC26A1 1 
protein expression compared to p-actin. Controis show iuciferase siRNA had no effect on SLC26A1 1 expression. Coiumns in (A) represent sampies from 
different rats. 

(C) in vivo knockdown of SLC4A3, A8, A10 with LNP-siRNAs resuits in no significant difference in the magnitude of neuronai sweiiing compared to a controi 
(iuciferase siRNA) in corticai brain siices imaged 5 days foiiowing the injection (p > 0.05, ANOVA). 

(D) Two different siRNA constructs against SLC26A1 1 resuit in a significant reduction in the magnitude of veratridine-induced neuronai sweiiing compared to 
iuciferase siRNA (p < 0.05, ANOVA). 

(G and H) Exampie images of corticai neurons transfected with siRNA using iipid nanoparticie deiivery shows SLC26A1 1 knockdown resuits in protection from 
veratridine triggered sweiiing compared to neurons transfected with SLC4A8 siRNA. Dii staining (red) shows ceii uptake of LNP-siRNA. 

(E and F) SLC26A11 biocker GiyH-101 significantiy reduces the magnitude of neuronai sweiiing induced by increases in [Na^j, p < 0.001, two-taiied Student’s 
t test (F) and the resuiting ceii death measured by LDH reieased, p < 0.001 , ANOVA. 



(legend continued on next page) 
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thereby decreased expression of KCC2, leading to decreased 
recovery of excitotoxic C\~ loads (Lee et al., 2011). In this study 
the authors were unable to identify the source of C\~ influx, but 
showed that it was independent of NKCC1 . If KCC2-mediated 
Cr efflux is indeed compromised following cytotoxic edema, 
in addition to blocking the influx of C\~ perhaps enhancing extru- 
sion of cr (Gagnon et al., 201 3) would be additionally beneficial. 

The identification of SLC26A11 as a significant Cr entry 
pathway during pathological swelling triggered after Na'^ entry 
suggests new strategies that could be developed toward 
reducing brain edema. There are numerous different pathways 
for Na^ entry that are activated during conditions such as 
hypoxia, stroke, and TBI. Our observations that cell death is 



Figure 7. SLC26A11 Gene Product Is 
Required for Activation of an Outwardly 
Rectifying C\~ Channel That Is Activated by 
Depolarization 

(A) Example image of a whole-cell voltage- 
clamped layer 4 neuron in a coronal brain slice. 

(B) Voltage clamp protocol used to depolarize 
neuron in presence of a cocktail to inhibit known 
voltage-dependent ion channels. 

(C) Left: Top, Example trace of outward current 
activated by depolarization. Middle, magnitude 
of current is reduced in DIDS. Bottom, subtrac- 
tion showing DIDS-sensitive component. Right: 
SLC26A11 sIRNA transfection attenuates DIDS- 
sensitive outward current. 

(D and E) Summarized lA/ curves demonstrate that 
SLC26A1 1 is required for activation of an outward 
Cl“ conductance that is activated in depolarized 
neurons. Low extracellular chloride (10.5 mM), 
GlyH-101 (50 i^M) and SLC26A11 LNP-sIRNA all 
significantly reduce magnitude of DIDS-sensitive 
current compared to Control and Luc-sIRNA 
transfection. Scale bars in (A): right, 500 laM; left, 
25 i^M. Error bars represent SEM. 



significantly reduced when overall C\~ en- 
try is prevented suggests that therapeutic 
strategies to inhibit SLC26A1 1 dependent 
cr entry may have widespread benefit 
toward treating these different conditions. 

EXPERIMENTAL PROCEDURES 



Imaging 

Live-cell imaging (brain slice) was performed 
with a two-photon laser-scanning microscope 
(Zeiss LSM510-Axioskop-2; Zeiss, Oberkochen, 
Germany) with a 40X-W/1.0 numerical aperture 
objective lens directly coupled to a Chameleon 
ultra2 laser (Coherent, Santa Clara, CA). CoroNa, 
SRI 01 and Dll were excited at 770 nm, and 
MQAE was excited at 760 nm. The fluorescence 
from each fluorophore was split using a dichroic 
mirror at 560 nm, and the signals were each detected with a dedicated photo 
multiplier tube after passing through an appropriate emission filter (Dll, SRI 01 : 
605 nm, 55 nm band pass; CoroNa, MQAE: 525 nm, 50 nm band pass). Trans- 
mitted light was simultaneously collected using understage infrared differential 
interference contrast optics and an additional photo multiplier tube. FLIM 
methodology is described in detail in the Extended Experimental Procedures. 

Data Collection, Analysis, and Statistics 

Translational movement was removed using Imaged software. Fluorescence 
signals were defined as delta F/F (dF/F) = [((Fi - Bi)-(Fq - Bo))/(Fq - Bq)], where 
Fi and Fq are fluorescence at a given time and the control period mean, 
respectively. Bi and Bq are the corresponding background fluorescence sig- 
nals. Swelling of individual neurons in cortical slices was analyzed as (%) in- 
crease in cross sectional area relative to a mean baseline period. The cross 
sectional area of the neuron was calculated using the fluorescence boundary 




Scale bar in (G) matches scale in (H). Luciferase controls are combined and plotted in both (C and D) and in Figure S5. For statistics on magnitude of 
swelling, ANOVA was performed comparing results from all sIRNA groups (luciferase, SLC4A3, -A8, -A10, -A3+A8+A10, SLC26A11 No.1 and No.2). Only 
SLC26A11 No.1 and No. 2 were significantly different from luciferase (control) sIRNA, p < 0.05. Error bars represent SEM. See also Figures S4 and S5 and 
Movie S5. 
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of the neuron soma stained with CoroNa. To estimate the tissue voiume from 
the two-dimensionai images of hippocampai siices a iine was drawn to mea- 
sure the diameter and the voiume was estimated based on the equation for voi- 
ume of sphere: (4/3)7rr^. 

Experimentai vaiues are the mean ± SEM; baseiine equais 100%; n is the 
number of experiments conducted (imaging data from > 3 individuai ceiis 
from each experiment were averaged for each n vaiue so that equai weight 
was given to each experiment and not affected by the number of ceiis imaged 
per experiment). Statisticai tests were either a two-taiied Student’s t test or an 
ANOVA with a Neumann-Keuis post hoc test for comparison between muitipie 
groups, p < 0.05 was accepted as statisticaiiy significant (*p < 0.05, **p < 0.01 , 

***p< 0.001). 

More detaiied methodoiogy can be found in the Extended Experimentai 
Procedures. 

SUPPLEMENTAL INFORMATION 

Suppiementai information inciudes Extended Experimentai Procedures, five 
figures, two tabies, and five movies and can be found with this articie oniine 
at http://dx.d 0 i. 0 rg/l 0.101 6/j.ceii.201 5.03.029. 
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SUMMARY 

Breathing is essential for survival and under precise 
neural control. The vagus nerve is a major conduit 
between lung and brain required for normal respira- 
tion. Here, we identify two populations of mouse 
vagus nerve afferents (P2ry1 , Npy2r), each a few hun- 
dred neurons, that exert powerful and opposing 
effects on breathing. Genetically guided anatomical 
mapping revealed that these neurons densely inner- 
vate the lung and send long-range projections to 
different brainstem targets. Npy2r neurons are 
largely slow-conducting C fibers, while P2ry1 neu- 
rons are largely fast-conducting A fibers that contact 
pulmonary endocrine cells (neuroepithelial bodies). 
Optogenetic stimulation of P2ry1 neurons acutely 
silences respiration, trapping animals in exhalation, 
while stimulating Npy2r neurons causes rapid, 
shallow breathing. Activating P2ry1 neurons did not 
impact heart rate or gastric pressure, other auto- 
nomic functions under vagal control. Thus, the vagus 
nerve contains intermingled sensory neurons con- 
stituting genetically definable labeled lines with 
different anatomical connections and physiological 
roles. 



INTRODUCTION 

Breathing is tightly regulated by the nervous system to ensure 
appropriate tissue oxygenation. Several classes of central and 
peripheral sensory neurons acutely regulate the respiratory cy- 
cle in response to changes in blood pH and gas composition, 
as well as external environment (Carr and Undem, 2003; Gon- 
zalez et al., 1994; Guyenet et al., 2010). Among these, sensory 
neurons of the vagus nerve are the major source of nerve fibers 
that innervate the lung and airways, and are important for 
normal breathing. The vagus nerve contains sensory neurons 
that provide critical information needed to control respiration 
rate, regulate airway tone and defense, and in some species, 
evoke cough (Canning et al., 2006; Carr and Undem, 2003; 
Coleridge and Coleridge, 2011; Trankner et al., 2014; Widdi- 
combe, 2001). However, the diversity of lung-innervating sen- 
sory neurons remains poorly characterized at a molecular level, 
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with specific neuron types that promote or restrict respiration 
genetically undefined. 

The vagus nerve is the tenth cranial nerve, characterized by 
a wandering trajectory that provides extensive innervation of 
the neck, chest, and abdomen (Berthoud and Neuhuber, 

2000) . The vagus nerve controls not only respiration, but also 
basic physiological functions of the cardiovascular, immune, 
and digestive systems. Most vagal neurons (~80%) provide 
ascending sensory information (Foley and DuBois, 1937), re- 
ceiving input from thoracic tissues like heart and lung, and 
abdominal tissues like stomach and intestine. Electrophysiolog- 
ical studies revealed both chemosensory and mechanosensory 
neurons of the vagus nerve (Berthoud and Neuhuber, 2000; Pain- 
tal, 1973). Within the airways, vagal sensory neurons detect 
irritants, cues associated with inflammation and illness, and me- 
chanical stretch of the lung during cycles of inhalation and exha- 
lation (Carr and Undem, 2003; Paintal, 1973; Widdicombe, 

2001) . The cell bodies of sensory fibers reside in pairs of ganglia 
at the base of the skull, including the adjacent nodose and jugu- 
lar ganglia (the nodose/jugular complex). Afferent vagal axons 
enter the brain bilaterally through the jugular foramina and pri- 
marily target the nucleus of the solitary tract (NTS), a brainstem 
nucleus that transmits sensory information to deeper brain struc- 
tures and descending motor nuclei (Berthoud and Neuhuber, 
2000; Kubin et al., 2006). 

We reasoned that the vagus nerve likely contains a diversity 
of molecularly distinct neuron types with different anatomical 
projections and functions. Previous descriptive classifications 
of vagal sensory neurons were based on neuron response 
properties like conduction velocity and adaptation rate (Carr 
and Undem, 2003) and did not enable genetic control for spe- 
cific analysis. Furthermore, classical procedures to manipulate 
vagus nerve function— surgical vagotomy and implantation of 
electrical stimulators— impact many neuron types in both the 
motor and sensory arms (Groves and Brown, 2005; Schachter 
and Saper, 1998). These procedures implicate the vagus nerve 
in many physiological systems and offer therapeutic options for 
several otherwise intractable diseases (Groves and Brown, 
2005; Schachter and Saper, 1998). However, because they 
lack cell specificity, they are blunt tools for analytical studies 
and cause unwanted side effects in patients. Gaining genetic 
access to the diversity of vagal sensory neurons might help 
disentangle the neural control of autonomic physiology. 
Here, we used a molecular and genetic approach to reveal 
the identity of two populations of breathing-control neurons in 
the vagus nerve. 
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RESULTS 

Identifying Cell Surface Receptors of the Sensory Vagus 
Nerve 

Molecularly distinct neuron subtypes have been classified within 
several sensory systems (Abraira and Ginty, 2013; Basbaum 
et al., 2009; Dong et al., 2001 ; Munger et al., 2009; Yarmolinsky 
et al., 2009). To identify markers for subtypes of vagal afferents, 
we used a genome-based strategy that previously enabled iden- 
tification of families of olfactory receptors (Liberies and Buck, 
2006; Liberies et al., 2009). Expression levels of ~400 G pro- 
tein-coupled receptors (GPCRs) were quantified in nodose/jugu- 
lar complex cDNA by qPCR. Candidate genes were cloned for 
cRNA riboprobe synthesis and examined for expression in 
neuronal subsets by in situ hybridization. These experiments re- 
vealed three GPCR genes, P2ry1 , Npy2r, and Gpr65, to be ex- 
pressed in subsets of vagus nerve afferents (Figure 1A). Each 
nodose/jugular complex contains ~2,300 neurons (Fox et al.. 



Figure 1. Genetic Control of Sensory 
Neuron Types in the Vagus Nerve 

(A) RNA in situ hybridization experiments in the 
nodose/juguiar compiex reveaied that P2iy1, 
Npy2r, and Gpr65 are expressed in subsets of 
vagai sensory neurons. 

(B) Two coior in situ hybridization experiments for 
indicated genes reveaied iargeiy non-overiapping 
neuron popuiations. The numbers of ceiis ex- 
pressing one receptor (red or green) or both re- 
ceptors (yeiiow) were counted. 

(C) The indicated Cre iines were crossed with lox- 
L10-GFP mice, and in offspring, fixed cryosections 
of the nodose/juguiar compiex were imaged by 
fluorescence microscopy. Native GFP fluores- 
cence (green) and a fluorescent NissI stain (gray) 
were visualized. Scale bars, 100 ^im. See also 
Figures SI and S2. 



y 2001); Gpr65 is expressed in 10.2% of 

I vagal sensory neurons (126/1,237, ~230 

neurons per ganglia complex), P2ry1 is 
expressed in 11.6% of vagal sensory 
neurons (190/1,631, ~280 neurons per 
ganglia complex), and Npy2r is ex- 
267 ) Pi'^ssed in 29.2% of vagal sensory 

j neurons (445/1,524, ~670 neurons per 
ganglia complex). Vagal NPY2R was 
proposed, and debated, to function 
in nutrient-evoked satiety (Karra and 
Batterham, 2010), while roles for vagal 
P2RY1 and GPR65 were not previously 
reported. 

Two color in situ hybridization analysis 
in the nodose/juguiar complex revealed 
that Npy2r, P2ry1, and Gpr65 were pre- 
dominantly expressed in different vagal 
sensory neurons (Figure IB). Most 
Npy2r neurons did not express P2ry1 
(99%, 669/674) or Gpr65 (100%, 419/ 
419); most P2ry1 neurons did not express Gpr65 (94%, 267/ 
285) or Npy2r (99%, 389/394); and most Gpr65 neurons did 
not express P2ry1 (90%, 154/172) or Npy2r (100%, 186/186). 
Thus, three major classes of vagal afferents are distinguishable 
by genetic markers and together account for ~50%-60% of 
nodose/juguiar sensory neurons. 

Genetic Control of Vagus Nerve Sensory Neurons 

Cre/LoxP technology enables powerful, genetically guided ap- 
proaches for connectivity mapping and remote control of neural 
activity (Rogan and Roth, 2011). We generated P2ry1 -ires-Cre, 
Gpr65- ires-Cre, and Npy2r-ires-Cre knockin mice (Figure SI), 
in which Cre recombinase is co-transcribed with the receptor 
gene and independently translated from an internal ribosome 
entry site (IRES) sequence (Kim et al., 1992). Each Cre knockin 
line was crossed with reporter mice harboring a Cre-dependent 
L10-GFP allele (iox-L10-GFP] similar reporter alleles are herein 
referred to as iox-reporter) (Krashes et al., 201 4), and in offspring. 
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Figure 2. Visualizing Vagal Afferents in the Lung 

(A) Cartoon depiction of the neurai tracing strategy, which invoived infection of a Cre-dependent AAV {AAV-flex-tdTomato) and/or a Cre-independent AAV {AAV- 
eGFP) in the nodose/juguiar compiex of ires-Cre knockin mice. 

(B) Whoie-mount anaiysis of native tdTomato fluorescence in a flattened lung lobe from a Vglut2-ires-Cre mouse infected with AAV-flex-tdTomato. Scale bar, 
1 mM. 

(C) Whole-mount analysis (maximum projection of stacked confocal images) of native tdTomato (tdT) and GFP fluorescence in the nodose/jugular complex of a 
Vglut2-ires-Cre mouse infected \N\th AAV-flex-tdTomato and AAV-eGFP. Scale bar, 100 |am. 

(D) Different Ires-Cre lines were infected \i\/\th AAV-flex-tdTomato and AAV-eGFP, and fibers were visualized in fixed lung cryosections by immunohistochemistry 
for tdTomato (red) and GFP (green). Scale bars, 1 mm. 

(legend continued on next page) 
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subsets of vagal sensory neurons displayed bright, native GFP 
fluorescence (Figure 1C). Cre recombinase drives reporter ex- 
pression to appropriate neurons of the nodose/jugular complex, 
as determined by two color in situ hybridization analysis (Fig- 
ure SI). In sum, 94% of P2ry1 neurons express reporter in 
P2ry1 -ires-Cre] lox-Channelrhodopsin2-eYFP mice, 94% of 
Npy2r neurons express reporter in Npy2r-ires-Cre] lox-Channelr- 
hodopsin2-eYFP mice, and 88% of Gpr65 neurons express re- 
porter in Gpr65-ires-Cre] lox-tdTomato mice. In contrast, 3.0% 
of Npy2r neurons and 2.4% of Gpr65 neurons express reporter 
in P2ry1 -ires-Cre] lox-Channelrhodopsin2-eYFP mice. As is 
commonly observed using Cre/LoxP techniques, we detected 
some reporter positive, receptor negative neurons (19%-27%), 
which could be due to imperfect detection of receptor expres- 
sion by in situ hybridization techniques or transient expression 
of Cre recombinase during development (Schmidt-Supprian 
and Rajewsky, 2007). 

In addition, we obtained Vglut2-ires-Cre and Chat-ires-Cre 
mice (Rossi et al., 2011; Vong et al., 2011), which provided 
important tools for global control of sensory neurons and motor 
neurons respectively. Calcium imaging experiments in acute 
neuron cultures from the nodose/jugular complex of Vglut2- 
ires-Cre] lox-tdTomato mice revealed that 100% (168/168) of 
KCI-responsive sensory neurons expressed tdTomato. Further- 
more, in Vglut2-ires-Cre] lox-L10-GFP mice, GFP was ex- 
pressed in 99.4% (632/636) of sensory neurons, but only rare 
motor neurons (6/845 dorsal motor nucleus of the vagus or 
DMV neurons and 28/320 nucleus ambiguus neurons) (Fig- 
ure S2). We also obtained Chat-ires-Cre mice that drive re- 
porter expression in most motor neurons (376/442 DMV 
neurons and 84/123 nucleus ambiguus neurons), but not in 
sensory neurons (0/599). In triple transgenic Vglut2-ires-Cre] 
lox-tdTomato] Chat-GFP mice, motor and sensory fibers are 
differentially visualized and partially segregated within the va- 
gus nerve trunk (Figure S2). Together, this toolbox of Cre lines 
enables differential genetic access to the vagus nerve motor 
and sensory arms, as well as three molecularly distinct sensory 
neuron subpopulations. 

Visualizing Lung-to-Brain Sensory Neurons of the Vagus 
Nerve 

We asked whether any of these genetically defined vagal affer- 
ents innervated the lung and thus might control breathing. The 
peripheral projections of Cre-expressing sensory neurons were 
traced using fluorescent reporters introduced by adeno-associ- 
ated virus (/\AV) infection of the nodose/jugular complex (Fig- 
ure 2A). Since each ires-Cre line drives reporter expression in 
locations other than the nodose/jugular complex, /\AV infections 
ensured that fluorescent afferents were specifically derived from 
the sensory vagus nerve. Infection efficiency was assessed by 
injection of Vglut2-ires-Cre] lox-L10-GFP mice with an /\AV con- 



taining a Cre-dependent tdTomato allele (AAV-flex-tdTomato). 
At 4 weeks after infection, tdTomato fluorescence was observed 
in ~45% of GFP-containing neuronal cell bodies (Figure S3) and 
was sufficiently bright to detect by whole mount analysis of the 
nodose/jugular complex and nerve trunk. Cre-expressing cells 
were similarly infected in P2ry1 -ires-Cre, Npy2r-ires-Cre, and 
Gpr65-ires-Cre mice (Figure S3). Red fluorescence was not 
observed in uninfected Cre mice or in wild-type mice infected 
with AAV-fiex-tdTomato (Figure S3). 

Infection of Vgiut2-ires-Cre mice with AAV-fiex-tdTomato 
yielded bright red fibers throughout the lungs and airways that 
could be readily visualized by whole mount analysis of a flattened 
lung lobe (Figure 2B). A dual infection strategy was used to quan- 
tify airway innervation by vagal afferent populations labeled in 
different Cre lines (Figures 2C and 2D). The nodose/jugular com- 
plex was simultaneously infected with AAV-fiex-tdTomato and 
a second AAV containing a Cre-independent GFP allele for 
normalization (AAV-eGFP). Dual immunohistochemistry for 
GFP and tdTomato was performed on lung cryosections ob- 
tained 4 weeks after infection. The areas of tdTomato- and 
GFP-derived fluorescence were measured in a 17.5-mm^ lung 
region containing several principal airways, and the ratio of 
tdTomato/GFP (T/G) labeling calculated. Dual virus infection ex- 
periments in Vgiut2- ires-Cre mice yielded GFP- and tdTomato- 
containing fibers that were highly co-localized throughout the 
lung and a benchmark T/G fluorescence ratio of 0.79. Related 
experiments using other Cre lines (Figure 2E) revealed that 
Npy2r and P2ry1 neurons provided dense innervation of the 
lung (0.42 T/G and 0.18 T/G respectively, or 54% and 23% of 
T/G observed in Vgiut2- ires-Cre mice), while Gpr65 neurons 
did not (0.02 T/G, 3%). Npy2r and P2ry1 neurons account for 
the majority of lung-innervating vagal fibers, but not all. Further- 
more, Npy2r and P2ry1 neurons do not exclusively innervate the 
airways, as labeled fibers were also detected in the heart and 
stomach. It is possible that each neuron type performs a similar 
sensory function in multiple tissues (such as detecting organ 
stretch or inflammation), and/or that additional markers are 
needed to subdivide these neuron classes further. 

Within the lung, sensory fibers visualized in P2ry1 -ires-Cre 
and Npy2r-ires-Cre mice displayed different arborization pat- 
terns and terminal morphologies. In both lines, the majority of fi- 
bers coursed along the major airways beneath and parallel to 
the smooth muscle layer (Figure 2F). P2ry1 neurons, but not 
Npy2r neurons, formed stereotyped candelabra endings at neu- 
roepithelial bodies, clusters of pulmonary secretory cells 
embedded within the epithelium and revealed by calcitonin 
gene-related peptide (CGRP) immunoreactivity (Figure 2G) 
(Brouns et al., 2009). P2ry1 neurons account for most or all 
vagal innervation of neuroepithelial bodies, based on the fre- 
quency of tdTomato-positive fibers in Vgiut2- ires-Cre and 
P2ry1 -ires-Cre mice (Figure 2H). Vagal afferents visualized in 



(E) Quantitative analysis of lung innervation in 1 7.5-mm^ lung regions expressed as an area ratio of T/G-derived immunofluorescence (mean ± SEM, see Results 
and Extended Experimental Procedures for additional detail on T/G calculation). 

(F) High resolution image of a representative vagal afferent beneath the epithelial layer, (E), of a major airway (airway lumen, L). Scale bar, 20 ^im. 

(G) Representative P2ry1 candelabra terminal (tdT fluorescence) at a neuroepithelial body (CGRP immunostaining, green). Scale bar, 20 ^im. 

(H) The number of neuroepithelial bodies (NEBs) innervated by each neuron type after visualization with AAV-flex-tdTomato and normalization with AAV-eGFP 
(n = 3-5, mean ± SEM, **p < 0.01 , and ***p < 0.001). See also Figure S3. 
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Figure 3. Characterization of Vagal P2ry1 
and Npy2r Neurons 

(A) Cartoon depiction of optogenetic strategy. The 
vagus nerve is surgicaiiy exposed in anesthetized 
mice and iiiuminated to activate ChR2-expressing 
sensory neurons. 

(B) Whoie nerve eiectrophysioiogicai recordings in 
Vglut2-ChR2 mice reveaied iight-induced action 
potentiais. 

(C) Compound action potentiais foiiowing brief 
optogenetic stimuiation (arrow) in Vglut2-ChR2, 
P2ry1-ChR2, and Npy2r-ChR2 mice. A and C fi- 
bers were ciassified based on conduction veiocity 
(Figure S4) x = 5 ms, y = 110 |iV (Vgiut2), 62 |aV 
(P2ry1), 160 i^V (Npy2r), and dashed inset, x = 
1.45 ms. 

(D) The ratio of A to C fibers was caicuiated by 
integrating corresponding peak area in the com- 
pound action potentiai; dashed iine: A/C ration of 1 
(n = 5-8, mean + SEM, and *p < 0.05). 

(E) Caicium imaging of singie neuron responses to 
capsaicin (2 ^iM) and KCi (50 mM) in acute gangiia 
cuitures from P2ry1 -ires-Cre] lox-L10-GFP and 
Npy2r-ires-Cre-, lox-L10-GFP mice. (Left paneis, 
coior scaie = 340/380 nm Fura-2 excitation ratio 
and right paneis, neurons expressing GFP [green, 
native fiuorescence] and responding to capsaicin 
[red] are superimposed and counted). Scaie bar, 
100 i^m. 

(F) Representative traces for singie neurons 
imaged in (E). See aiso Figure S4. 



o 

CO 

CO 

LL 

— 

o 

CO 




time (s) 



time (s) 



Npy2r- ires-Cre mice did not innervate neuroepithelial bodies, 
but were instead enriched near alveoli in the lung respiratory 
zone (Figure S3). 

Physiological and Molecular Characterization of Vagal 
P2ry1 and Npy2r Neurons 

Vagal afferents in the lung are a heterogeneous group of 
fast-conducting myelinated A fibers and slow-conducting unmy- 
elinated C fibers (Carr and Undem, 2003). We used a channelr- 
hodopsin-assisted approach to measure the specific conduction 
velocities of vagal P2ry1 and Npy2r neurons. We indepen- 
dently crossed wild-type, Vglut2-ires-Cre, P2ry1 -ires-Cre, and 
Npy2r-ires-Cre mice with reporter mice containing a Cre-depen- 
dent channeirhodopsin-2 (ChR2) allele (iox-Channeirhodopsin2- 
eYFP, offspring of each cross are subsequently referred to as 
driver-ChR2 or, in controls, iox-ChR2 mice). Optogenetic activa- 
tion of vagal fibers was achieved in anesthetized animals by 
focal illumination of the vagus nerve trunk (Figure 3A). Whole 
nerve recordings revealed robust light-induced action potentials 



that were not similarly observed in 
control animals lacking Cre recombinase 
(Figure 3B). 

Brief optogenetic stimulation (0.8 ms) 
of all sensory neurons in Vgiut2-ChR2 
mice generated a compound action po- 
tential resulting from summation of slow- 
conducting and fast-conducting neurons 
(Figure 3C). Propagation speed was 
calculated by varying the distance between the optic fiber and 
recording electrodes (Figure S4), and two major peaks were 
resolved with conduction velocities characteristic of A fibers 
(10.2 ± 4.0 m/s) and C fibers (0.71 ± 0.04 m/s). Similar experi- 
ments in P2ry1-ChR2 and Npy2r-ChR2 mice revealed that 
most P2ry1 neurons were A fibers and most Npy2r neurons 
were C fibers (Figure 3D). 

Sensory afferents in the lung are also heterogeneous with 
respect to capsaicin sensitivity, with most C fibers being capsa- 
icin-responsive. Acute cultures of nodose/jugular ganglia were 
prepared from P2ry1 -ires-Cre] iox-L10-GFP and Npy2r-ires- 
Cre] iox-L10-GFP mice and responses of single, genetically 
defined neurons were measured by calcium imaging with Fura- 
2. Capsaicin activated 60.7% (1,087/1,791) of all vagal sensory 
neurons, 0% (0/35) of P2ry1 neurons, and 81 .3% (447/550) of 
Npy2r neurons (Figures 3E and 3F). Furthermore, two color 
in situ hybridization analysis revealed that most Npy2r (62%, 
193/310) neurons expressed the gene encoding the capsaicin 
receptor TRPV1 , while most P2ry1 neurons (95%, 213/224) did 
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not (Figure S4). For comparison, the mechanoreceptor Piezo2 
(Coste et al., 2010) was instead co-expressed in many P2ry1 
neurons (44%, 170/388), rare Npy2r neurons (4.7%, 14/297), 
and not in Gpr65 neurons (0%, 0/94) (Figure S4). Taken together, 
P2ry1 neurons are mostly fast-conducting, capsaicin-insensitive 
A fibers, while Npy2r neurons are mostly slow-conducting, 
capsaicin-responsive C fibers. 

Optogenetic Control of the Respiratory Cycle 

We reasoned that lung-innervating sensory neurons labeled in 
P2ry1 -ires-Cre and Npy2r-ires-Cre mice might control breathing. 
Classical techniques to study vagus nerve functions— surgical 
vagotomy and bulk electrical stimulation — are unable to distin- 
guish the specific contributions of different co-fasciculating fi- 
bers. Flere, we used optogenetic approaches in freely breathing, 
anesthetized lox-ChR2, Vglut2-ChR2, Chat-ChR2, P2ry1-ChR2, 
Gpr65-ChR2, and Npy2r-ChR2 mice to query the roles of partic- 
ular vagal neuron populations in respiratory physiology. 

Optogenetic activation of all vagal sensory neurons in Vglut2- 
ChR2 mice revealed powerful light-induced changes in respira- 



Figure 4. Remote Control of Breathing 

(A) Respiratory effects following focal vagus nerve 
illumination (yellow shading) in lox-ChR2, Vglut2- 
ChR2, P2ry1-ChR2, Npy2r-ChR2, and Gpr65- 
ChR2 mice. Respiratory rhythms (representative 
traces) were measured using a pressure trans- 
ducer via trachea cannula. Changes in respiration 
rate and minute volume were calculated over time, 
with each data point reflecting a 5 s bin. 

(B) Light-induced changes in respiration rate, tidal 
volume, and minute volume were calculated over 
the 10 s trial (n = 4-8 as indicated, mean ± SEM, 
and ***p < 0.001). See also Figures S5 and S6. 



tion (Figure 4). In each animal tested, illu- 
mination caused an abrupt pause in 
breathing that persisted for an average 
of 6.2 s, followed by a secondary phase 
characterized by shallow breathing until 
the light was turned off. Over the 1 0 s trial, 
light-induced activation of vagal sensory 
neurons caused a 54% decrease in respi- 
ration rate, a 52% decrease in tidal 
volume, and a 79% decrease in minute 
volume. In contrast, similar changes in 
respiration were not observed in Chat- 
ChR2 or lox-ChR2 mice (Figures 4 and 
S5). These findings are consistent with a 
pronounced role for vagal sensory neu- 
rons in breathing regulation via reflex cir- 
cuitry involving descending spinal motor 
neurons. 

Next, we asked if vagal subpopulations 
labeled in P2ry1 -ires-Cre, Gpr65-ires- 
Cre, and Npy2r-ires-Cre mice elicited 
_ similar effects (Figure 4). Activation of 

P2ry1 neurons, which represent 11.6% 
of the vagal sensory neuron repertoire, 
caused an immediate and striking inhibition of respiration that 
was of statistically similar acute duration (7.9 s) to that observed 
in Vglut2-ChR2 mice. However, the secondary phase involved 
full breaths that were rarer than those observed during activation 
of all vagal afferents in Vglut2-ChR2 mice (Figure S6). Over the 
10 s trial, activating P2ry1 neurons caused a 72% decrease in 
respiration rate, no significant effect on tidal volume (5.5% in- 
crease), and a 67% decrease in minute volume. The different 
respiratory effects observed following light stimulation in 
P2ry1-ChR2 and Vglut2-ChR2 mice suggested contributions 
from other vagal sensory neurons in breathing control. 

Activating Npy2r neurons evoked a different respiratory 
response characterized by rapid and shallow breathing. In 
Npy2r-ChR2 mice, we observed a light-induced 68% increase 
in respiration rate, a 44% decrease in tidal volume, and a 3% 
decrease in minute volume. Light-induced respiratory effects in 
P2ry1 -ires-Cre and Npy2r-ires-Cre mice are seemingly due to 
stimulation of different sensory neuron populations; contribu- 
tions from motor fibers are unlikely since bulk activation of motor 
neurons in Chat-ires-Cre mice had no effect on respiration rate. 
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Figure 5. P2ry1 Neurons Trap Respiration in a State of Exhalation 

(A) Theoretical models of lung volume changes during light-induced inhalation and exhalation trapping. 

(B) Representative data showing changes in lung volume following optogenetic activation of vagal afferents in P2ry1-ChR2 mice. 

(C) Percentage change in total lung volume evoked by light in P2ry1-ChR2 mice (n = 5) and control mice {lox-ChR2) (n = 8). Total lung volume was calculated by 
integrating lung volume across 10 s periods before and during light stimulation. 

(D) The percentage of t\rr\e P2ry1 -ChR2 mice and control mice were in a high lung volume state before, during, and after light stimulation. High volume state was 
defined as greater than mean volume during tidal breathing (mean ± SEM, *p < 0.05, and ***p < 0.001). 



tidal volume, or minute volume (Figure S5). Likewise, activating 
Gpr65 neurons had no significant effect on these breathing pa- 
rameters, and together with their sparse lung innervation, 
Gpr65 neurons likely mediate other vagus nerve functions. The 
different respiratory effects of Npy2r, P2ry1 , and Gpr65 neurons, 
or lack thereof, suggest that the vagus nerve contains function- 
ally segregated labeled lines within the context of the respiratory 
system. 

P2ry1 neurons caused an acute pause in respiration that 
could be due to sustained inhalation (breath holding) or exhala- 
tion. These potential mechanisms could be distinguished by 
measuring light-induced changes in lung volume (Figure 5); ani- 
mals trapped in inhalation would have increased lung volume, 
while animals trapped in exhalation would have decreased 
lung volume. We observed that light-induced activation of 
P2ry1 neurons decreased lung volume (68% decrease in inte- 
grated lung volume over the 1 0 s trial), consistent with exhalation 
trapping and decreased time spent with lungs in a high volume 
state (77% decrease, with high volume state defined as lung vol- 
ume greater than mean lung volume during tidal breathing). 
Similar experiments in control animals lacking a Cre driver (lox- 
ChR2) failed to show a significant decrease in lung volume 
(0.6%) or time in a high volume state (0.4%). 

Next, we asked whether P2ry1 and Npy2r neurons control 
other autonomic functions of the vagus nerve. We used optoge- 
netic approaches to activate vagal sensory neurons in lox- 
ChR2, Vglut2-ChR2, P2ry1-ChR2, and Npy2r-ChR2 mice, and 
measured heart rate by electrocardiogram (EGG) recordings 
and gastric pressure by a cannulated pressure sensor. Activating 



all sensory neurons in Vglut2-ChR2 mice caused a profound 
drop in heart rate (-85%) and a decrease in gastric pressure 
(-1 1 .3%), with both tonic and phasic components affected. Spe- 
cifically activating P2ry1 neurons, however, had no significant ef- 
fect on heart rate (-3.8%) or gastric pressure (-2.2%) (Figure 6), 
while activating vagal Npy2r neurons decreased both heart rate 
(-41.2%) and gastric pressure (-12.8%) (Figure S5). It is 
possible that the Npy2r-ires-Cre allele drives reporter expression 
in multiple neuron subtypes with specific functions, or in a single 
neuron type that impacts multiple organ systems. Results with 
P2ry1 neurons indicate that vagal control of breathing can be 
dissociated from effects on heart rate and gastric pressure by 
acute and selective stimulation of particular sensory neurons. 

Regionalization of Sensory Neuron Inputs in the 
Brainstem 

The different respiratory effects evoked by P2ry1 and Npy2r 
sensory neurons suggest engagement of distinct higher-order 
neural circuits. The axons of vagal sensory neurons densely 
innervate the NTS, area postrema, and spinal trigeminal nucleus 
(Berthoud and Neuhuber, 2000; Kalia and Mesulam, 1980), and 
topographic organization of vagal inputs in the NTS based on 
either physiological function or organ innervation has been pro- 
posed (Altschuler et al., 1 989; Bailey et al., 2006; Katz and Karten, 
1983; Kubin et al., 2006), but debated (Andresen et al., 2012). 
Here, we used genetically encoded neural tracers to ask how 
inputs from P2ry1 and Npy2r neurons are organized centrally. 

We used AAV-directed neural tracing technology to visualize 
axons of Cre-expressing vagal sensory neurons in the brainstem 
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Figure 6. P2ry1 Neurons Acutely Control Breathing, but Not Gastric Pressure or Heart Rate 

(A) Measurements (representative traces) of respiratory rhythm, heart rhythm, and gastric pressure foiiowing focai iiiumination (yeiiow shading) of the vagus nerve 
in anesthetized P2ry1-ChR2, Vglut2-ChR2, and controi {lox-ChR2) mice. Heart rate was measured by ECG, with boxed insets showing rhythms before (1, ieft), 
during (2, middie), and after (3, right) iight exposure, intraiuminai gastric pressure was measured using a pressure transducer inserted through the pyioric 
sphincter. 

(B) Changes in heart rate (normaiized from a 30 s pre-stimuius period) were caicuiated over time, with each data point reflecting a 5 s bin. 

(C) Changes in heart rate and gastric pressure were calculated over the first 1 0 s or 3 min of light stimulation respectively (mean ± SEM and ***p < 0.001 ). See also 
Figure S5. 



(Figures 7 and S7). We infected the nodose/jugular complex with 
both AAV-flex-tdTomato and AAV-eGFP for visualizing Cre-ex- 
pressing neurons (red) in the context of all types of vagal sensory 
fibers (green). Dual infection of Vglut2-ires-Cre mice yielded red 
and green fibers penetrating the brain ipsilateral to the injection 
site and dense arborizations in both the NTS and area postrema. 
Arborizations occurred bilaterally in the NTS, with enrichment ipsi- 
lateral to the infected ganglia. Next, we performed similar experi- 
ments in P2ry1 -ires-Cre mice and observed that the projection 
field of P2ry1 neurons was spatially confined and did not extend 
over the entire vagal NTS. Vagal P2ry1 neurons arborized immedi- 
ately proximal to the ascending fiber tract in the lateral region of 
the NTS, and these lateral branches were observed throughout 



the anterior-posterior axis. Intriguingly, the dorsal respiratory 
group, which contains second order neurons that control respira- 
tion, is located in the lateral NTS (Saether et al., 1987; Speck and 
Feldman, 1982). Similar experiments in Npy2r-ires-Cre mice re- 
vealed that Npy2r neurons instead innervated a different NTS re- 
gion, with fibers predominantly emerging in the medial posterior 
aspects of the vagal NTS and area postrema, regions known to 
receive pulmonary C fiber input (Kubin et al., 1991, 2006). We 
quantified innervation density along the medial-lateral axis of the 
NTS and in the area postrema by calculating the area of red fibers 
and normalizing to the area of green fibers. We observed 21 -fold 
higher levels of P2ry1 neuron-derived fluorescence compared 
with Npy2r neuron-derived fluorescence in laterally arborizing 
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Figure 7. Non-Overlapping Central Projections of Vagal P2ry1 and Npy2r Neurons 

(A) The nodose/jugular complex of P2ry1 -ires-Cre and Npy2r-ires-Cre mice was infected with AAV-Z/ex-fc/Tomafo {AAV-flex-tdT) and AAV-eGFP. At 4 weeks after 
infection, fixed brainstem cryosections were analyzed by two color immunohistochemistry for tdTomato (red) and eGFP (green). Representative images of 
anterior and posterior brainstem containing the vagal projection field are shown (full rostral-caudal series, Figure S7). Solitary tract, sol; fourth ventricle, 4V; central 
canal, CC; area postrema, AP; L-NTS includes ventral, lateral, ventrolateral, interstitial, and intermediate NTS subnuclei; and M-NTS includes dorsolateral, 
dorsomedial, medial, and commisural NTS subnuclei. Scale bar, 100 ^im. 

(B) Quantitative analysis of innervation by P2ry1 and Npy2r fibers in L-NTS, M-NTS, and AP, expressed as an area ratio of T/G fluorescence. Fluorescence was 
summed in every eighth section (25 |am) from Bregma -6.4 mm to -7.8 mm. (n = 4, mean ± SEM, *p < 0.05, **p < 0.01 , and ***p < 0.001). See also Figure S7. 



NTS fibers. In contrast, we observed 6-fold and 43-fold higher 
levels of Npy2r neuron-derived fluorescence compared with 
P2ry1 neuron-derived fluorescence in medially arborizing NTS 
fibers and the area postrema, respectively. Thus, we observed 
non-overlapping and highly regionalized central projections of 
P2ry1 and Npy2r neurons that involved innervation of different 
NTS subnuclei. Together, these findings suggest that P2ry1 and 
Npy2r neurons engage different higher order neural circuits and 
are consistent with a brainstem map of vagal inputs that is at least 
partially linked to physiological function. 



DISCUSSION 

The vagus nerve provides the major sensory innervation of the 
lung and mediates basic physiological functions in breathing 
control and respiratory defense. Understanding the diversity of 
lung-innervating sensory neurons is an essential step toward dis- 
entangling the neural control of respiration. 

Lung-innervating sensory neurons have been distinguished by 
their response kinetics and adaptation rates (Carr and Undem, 
2003). Three types of sensory neurons were described based 
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on these parameters: C-fibers, rapidly adapting stretch recep- 
tors (RARs), and slowly adapting stretch receptors (SARs). 
RARs and SARs both respond to lung inflation, with different 
adaptation rates, while only RARs respond to lung deflation. 
RARs have also been proposed, along with C fibers, to detect 
some irritants and cytokines and perhaps mediate cough re- 
sponses (Coleridge and Coleridge, 201 1 ). It is possible that there 
are multiple subclasses of RARs, SARs, and C fibers, and a 
limitation of this classification scheme is that it does not enable 
genetic analysis for specific functional manipulation. Here, we 
initiated an alternative approach to sensory neuron classification 
in the vagus nerve based on receptor expression. 

We identified one vagal sensory neuron type that expresses 
the purinergic receptor P2RY1, and generated P2ry1 -ires-Cre 
mice to enable genetic access for anatomical mapping, physio- 
logical characterization, and remote control of neural activity. 
P2ry1 neurons innervate the lung with characteristic candelabra 
terminals that account for most or all vagal innervation of neuro- 
epithelial bodies, poorly understood clusters of pulmonary endo- 
crine cells. Ascending axons of P2ry1 neurons cross the jugular 
foramina and display highly regionalized and stereotyped inputs 
to the lateral NTS, which contains the dorsal respiratory group, a 
brainstem nucleus that regulates breathing (Saether et al., 1987; 
Speck and Feldman, 1982). Channeirhodopsin-assisted con- 
duction velocity measurements showed that P2ry1 neurons are 
mostly fast-conducting A fibers, consistent with the observation 
that vagal sensory neurons innervating neuroepithelial bodies 
are myelinated (Brouns et al., 2009). Furthermore, P2ry1 neurons 
are capsaicin-insensitive and do not express TRPV1 , but instead 
many express the mechanoreceptor Piezo2. Optogenetic acti- 
vation of P2ry1 neurons caused an acute and dramatic pause 
in breathing, trapping animals in a state of exhalation. The Her- 
ing-Breuer inflation reflex, first reported in 1868, is a vagally 
mediated respiratory reflex evoked by pulmonary stretch-de- 
tecting SARs that innervate the lateral NTS and cause an inhibi- 
tion of inspiration (Schelegle and Green, 2001). However, the 
Hering-Breuer inflation reflex also evokes a mild tachycardia 
that we did not observe following activation of P2ry1 neurons. 
Furthermore, pulmonary stretch receptors are thought to reside 
in the smooth muscle, whereas P2ry1 neurons innervate neuro- 
epithelial bodies. An alternative possibility is that P2ry1 neurons 
represent a different type of A fiber distinct from RARs and SARs; 
the recent proposal of an A fiber nociceptor termed HTARs (Yu 
et al., 2007) supports the possibility of additional A fiber types 
and highlights the need for better cell classification schemes. 

A second class of vagal sensory neuron is defined by expres- 
sion of NPY2R, and Npy2r-ires-Cre mice were likewise generated. 
Npy2r neurons display enriched innervation of the alveoli-contain- 
ing respiratory zone of the lung and do not contact neuroepithelial 
bodies. Npy2r neurons are largely slow-conducting C fibers, ex- 
press the capsaicin receptor TRPV1 , and respond to capsaicin 
in single neuron imaging experiments involving acute cultures of 
nodose/jugular ganglia. Centrally, Npy2r neurons target a medial 
posterior region of the NTS that receives pulmonary C fiber input 
(Kubin etal., 1991), and this region is distinct from the innervation 
zone of P2ry1 neurons. Optogenetic activation of Npy2r neurons 
caused rapid and shallow breathing, a respiratory effect re- 
miniscent of certain pulmonary defense responses (Coleridge 



and Coleridge, 2011). Rapid and shallow breathing is a classical 
response evoked by several C fiber-activating stimuli, including 
bradykinin, histamine, capsaicin, irritants, and pulmonary con- 
gestion (Coleridge and Coleridge, 2011; Coleridge et al., 1983). 
Based on these findings, lung-innervating Npy2r neurons are 
likely pulmonary nociceptors. 

Powerful and opposing effects on respiratory physiology were 
evoked by activating only a few hundred P2ry1 or Npy2r neurons 
in the sensory vagus nerve. Based on evidence presented here, 
these genetic markers label fundamentally different neuron types 
within the context of the respiratory system. It is possible that 
these neuron classes can be further subdivided into even smaller 
neuron groups with more specific organ targets and functions. 
Additional studies involving other Cre driver lines and perhaps 
more complex approaches such as intersectional genetics 
may help further delineate functionally relevant neuron types 
(Dymecki and Kim, 2007). P2ry1 and Npy2r sensory neurons 
innervate the lung as well as other tissues, but their brainstem 
projections are nevertheless strikingly distinct. It is possible 
that Cre-expressing afferents from other physiological systems 
influence respiratory responses. In this scenario, a prime candi- 
date would be fibers from the cardiovascular system. However, 
in P2ry1 mice, a role for cardiovascular fibers seems unlikely, as 
activating P2ry1 neurons did not impact heart rate, as would be 
expected for known cardiac, aortic body, and carotid body re- 
flexes. Furthermore, carotid body chemoreceptors promote 
rather than inhibit inspiration and gut fibers seem unlikely to 
impact respiration. Instead, a parsimonious interpretation is 
that observed effects on breathing are mediated by lung-derived 
afferents. 

Neuron-selective optogenetic experiments revealed that vagal 
control of breathing could be dissociated from vagal control of 
heart rate or gastric pressure. These findings indicate that the va- 
gus nerve contains co-fasciculating labeled lines that control 
specific aspects of autonomic physiology. The existence of 
dedicated channels in the vagus nerve for particular autonomic 
functions provides a streamlined flow of information that is 
similar to coding strategies used in other sensory systems. For 
example, in the gustatory system, different sensory neurons 
are devoted to detecting chemicals that evoke sweet, salty, 
sour, savory, and bitter sensations (Barretto et al., 2015; Helle- 
kant et al., 1998). Likewise the somatosensory system contains 
a diversity of sensory neuron types, including those that detect 
gentle touch or pain (Abraira and Ginty, 2013; Basbaum et al., 
2009). Here, the selective effects of P2ry1 neuron activation 
and the differential effects of Npy2r neuron activation indicate 
that functional segregation of vagal inputs can likewise begin in 
the periphery and persist in the brainstem, ultimately resulting 
in specific physiological responses. 

Obtaining genetic access to sensory neurons has provided a 
framework for studying a myriad of perceptions, from our external 
senses of smell, touch, taste, vision, and hearing to internal 
senses associated with hunger and satiety. For example, two in- 
termingled classes of hypothalamic neurons exert opposing ef- 
fects on hunger and identifying neuropeptide markers for these 
neurons provided a critical basis for studying the neural control 
of feeding (Elmquist et al., 1999). Here, we gain genetic access 
to two populations of breathing control-neurons, providing a 
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molecular and cellular framework for understanding respiration 
control by the autonomic nervous system. 

EXPERIMENTAL PROCEDURES 
Animals 

All animal procedures complied with institutional animal care and use commit- 
tee guidelines. P2ry1-ires-Cre, Npy2r-ires-Cre, and Gpr65-ires-Cre were pre- 
pared using standard bacterial artificial chromosome (BAG) recombineering 
approaches (Figure S1), as previously described (Krashes et al., 2014) (see 
Extended Experimental Procedures). Chat-ires-Cre (006410), Chat-GFP 
(007902), lox-ChR2 (012569), and lox-tdTomato (007914) were purchased 
(Jackson). Vglut2-ires-Cre and lox-L10-GFP mice (Krashes et al., 2014; Vong 
et al., 2011) were generous gifts from Bradford Lowell (Beth Israel Deaconess 
Medical Center). All Cre driver lines used are viable and fertile and abnormal 
phenotypes were not detected. 

Receptor Expression Studies 

GPCRs were identified in the nodose/jugular complex using techniques estab- 
lished for identifying olfactory receptors (Liberies and Buck, 2006; Liberies 
et al., 2009). cDNA was prepared from acutely isolated and DNase-treated 
nodose/jugular RNA and used as a template in qPCR reactions involving 
primers that recognize ^400 endo-GPCRs (Table SI). In situ hybridization 
studies were performed on 10 lam cryosections of nodose/jugular ganglia as 
previously described (Ferrero et al., 2013; Liberies et al., 2009), except data 
in Figures SI and S4C involved digoxigenin probes alternatively visualized 
with peroxidase conjugated anti-digoxigenin antibody and TSA-Plus-Cy5 
(Perkin Elmer). Probes are described in Extended Experimental Procedures. 

AAV Infections of the Nodose/Jugular Complex 

The left nodose/jugular complex of adult mice was surgically exposed under 
anesthesia by making an incision along the ventral surface of the neck and 
blunt dissection. A micropipette containing a 1:1 mixture oi AAV-flex-tdTo- 
mato (Penn Vector Core, AV-9-ALL864, titer, 1.3 x 10^^ genome copies/ml), 
and AAV-eGFP (Penn Vector Core, AV-9-PV1963, titer: 3.6 x 10^^ genome 
copies/ml), as well as 0.05% Fast Green FCF (Sigma-Aldrich) was inserted 
into the nodose/jugular complex. Virus solution was injected (140 nl) using a 
Nanoject II injector (Drummond), and success determined by Fast Green 
Dye filling of the ganglia. Animals recovered from surgery and were sacrificed 
4 weeks later for tissue harvest. 

Optogenetic Stimulation of the Vagus Nerve and Physiological 
Measurements 

Animals were deeply anesthetized (isoflurane, 1 .5%-2%, Abbott Laboratory), 
freely breathing, and maintained at normal body temperature. The left nodose/ 
jugular complex was surgically exposed and an optic fiber (200 |im core, 
Thorlabs) coupled to a DPSS laser light source (473 nm, 150 mW, Ultralaser) 
positioned for focal illumination beneath the ganglion and above the pharyn- 
geal and superior laryngeal branches. Light stimulation (5 ms pulses, 75- 
125 mW/mm^ intensity, for respiratory and cardiovascular effects, 50 Hz, 
10 s; for gastric pressure measurements, 5 Hz, 3-6 min) was controlled by a 
shutter system (Uniblitz). Respiration rate was measured using an amplifier- 
coupled pressure transducer (Biopac) cannulated into the trachea. A breath 
was scored if lung volume increased to at least 10% of mean tidal volume. 
Tidal volume was calculated by integrating airflow per breath, and minute vol- 
ume was calculated by multiplying tidal volume by respirations per minute. 
Lung volume to determine state of inhalation or exhalation was determined 
by integrating airflow across time. Heart rhythm was measured by ECG, which 
was recorded with two needle electrodes placed subcutaneously on the right 
forepaw and the left hindpaw and amplified with a differential amplifier (A-M 
systems). To measure intraluminal gastric pressure, stomach contents were 
emptied by introducing saline through an esophageal cannula and draining 
through a pyloric cannula. An amplifier-coupled pressure tranducer (Biopac) 
was connected to a fluid-filled catheter and placed into the stomach through 
the pyloric sphincter. Saline (400 ^iL) was introduced through the esophageal 
cannula, and pressure measurements were acquired (1 kHz sampling, MP150 



data acquisition system, Biopac). Data analyzed in 5 s bins (Figures 4C, 6B, 
S5A, and S5C) were normalized by comparison to values obtained during a 
30 s baseline period. 

Electrophysiology 

For whole nerve electrophysiology, the vagus nerve was cervically transected, 
and the peripheral transected end was desheathed and placed onto a pair of 
platinum-iridium electrodes (A-M systems). Optical fibers were positioned 
distally to illuminate the peripheral trunk, a ground electrode was placed on 
nearby muscle, and the neck cavity was filled with halocarbon oil. Nerve activ- 
ities were detected with an audio monitor (Grass), recorded with an alternating 
current (AC) preamplifier (Grass, at 1 kHz sampling rate unless specifically 
mentioned), and acquired on a MP150 data acquisition system (Biopac). Com- 
pound action potentials in response to a 0.8 ms light stimulus were recorded 
(50 kHz sampling). Fiber conduction velocity was determined by varying the 
distance between the optic fiber and recording electrode (travel distance); re- 
sulting time lags in peak maxima (At) were graphed as a function of travel dis- 
tance, revealing characteristic A and C fiber types. The ratio of A to C fibers 
was calculated by integrating corresponding peak area in the compound ac- 
tion potential. Since the A and C peaks were not completely separated in 
most recordings, the A/C ratio reported is an underrepresentation of fold 
enrichment. 

Data Analysis 

Sample sizes are indicated in bar graphs (numbers in parenthesis), and signif- 
icance was determined by comparisons to the indicated control group using a 
two-tailed Student’s t test. 
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• RBP4 missense mutations: eye defects with reduced 
penetrance and maternal inheritance 

• Heterozygotes have normal circulating RBP, but reduced 
vitamin A levels in serum 

• Dominant-negative RBPs bind retinol poorly, but occupy 
STRA6 with very high affinity 

• Skewed inheritance due to a functional restriction of 
placental vitamin A transport 
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SUMMARY 

Gestational vitamin A (retinol) deficiency poses a risk 
for ocular birth defects and blindness. We identified 
missense mutations in RBP4, encoding serum retinol 
binding protein, in three families with eye malforma- 
tions of differing severity, including bilateral anoph- 
thalmia. The mutant phenotypes exhibit dominant 
inheritance, but incomplete penetrance. Maternal 
transmission significantly increases the probability 
of phenotypic expression. RBP normally delivers 
retinol from hepatic stores to peripheral tissues, 
including the placenta and fetal eye. The disease mu- 
tations greatly reduce retinol binding to RBP, yet 
paradoxically increase the affinity of RBP for its cell 
surface receptor, STRA6. By occupying STRA6 non- 
productively, the dominant-negative proteins disrupt 
vitamin A delivery from wild-type proteins within the 
fetus, but also, in the case of maternal transmission, 
at the placenta. These findings establish a previously 
uncharacterized mode of maternal inheritance, 
distinct from imprinting and oocyte-derived mRNA, 
and define a group of hereditary disorders plausibly 
modulated by dietary vitamin A. 

INTRODUCTION 

Congenital eye malformations— including microphthalmia, 
anophthalmia, and coloboma (MAC) disease— affect two in 
1 0,000 births and are an important cause of childhood blindness 
(Morrison et al., 2002). The severity depends on timing and the 
extent that growth and morphogenesis of the developing eye 
is disrupted (Graw, 2010). Anophthalmia, or total absence of 
eyes, is the most extreme form. Microphthalmia (small eyes) 
and coloboma (ventronasal notch-like defects in the iris and/or 
retina, arising from incomplete closure of the choroid fissure, 
see Onwochei et al., 2000) are less severe. These can occur as 
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uni- or bilateral abnormalities, and may coexist in an individual 
or pedigree. Most cases are isolated, but one-third are associ- 
ated with systemic birth defects. Few genetic causes have 
been identified (Williamson and FitzPatrick, 2014). Loss-of-func- 
tion SOX2 mutations account for 10% of bilateral anophthalmia 
(Fantes et al., 2003), whereas mutations in RX, CHX10, BCOR, 
HCCS, and PAX6 transcription factors explain other monogenic 
cases. Signaling pathways mediated by BMP4, GDF6, and SHH 
may also be genetically disrupted. Finally, disease risk is 
affected by environmental factors, such as maternal nutrition 
(Hornby et al., 2003). 

Vitamin A is an essential, fat-soluble nutrient for embryonic 
development, tissue homeostasis, and physiology. Its most 
widely recognized function is to supply the visual cycle with 
1 1 -c/s-retinal (vitamin A aldehyde) for generation of the light-sen- 
sitive visual pigment rhodopsin (Lamb and Pugh, 2004). Conse- 
quently, vitamin A deficiency (VAD) first manifests as night 
blindness (nyctalopia), a reversible loss of visual adaptation to 
dark environments (Dowling and Wald, 1958). Vitamin A is also 
required for epithelial, reproductive, and immune health. At the 
molecular level, vitamin A is a substrate for synthesis of retinoic 
acid (RA), a potent signaling molecule needed for vertebrate 
organogenesis, including eye development (Duester, 2009; Nie- 
derreither and Dolle, 2008). Nutritional studies have long associ- 
ated maternal vitamin A deficiency with eye malformations, as 
well as urogenital, diaphragmatic, cardiovascular, and pulmo- 
nary defects (Hale, 1933; See and Clagett-Dame, 2009; Wilson 
et al., 1953). Recently, genetic links were established between 
retinoid signaling defects and MAC disease. Loss-of-function 
mutations in STRA6, encoding the membrane receptor for serum 
retinol binding protein (RBP), cause autosomal recessive anoph- 
thalmia or Matthew-Wood syndrome (OMIM 601 1 86), character- 
ized by structural eye defects, diaphragmatic hernias, cardiac 
malformations, and pulmonary hypoplasia (Golzio et al., 2007; 
Pasutto et al., 2007; Casey et al., 2011; Chassaing et al., 
2009). Likewise, mutations \nALDH1A3, encoding retinaldehyde 
dehydrogenase, account for a subset of recessive MAC cases 
(T.G., C.M.C., A.S., T.B., and N.M. Ghiasvand, unpublished 
data; Fares-Taie et al., 2013; Yahyavi et al., 2013). 
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Figure 1. Familial MAC Disease 

(A) Family 1 pedigree with two probands (arrows). 
There are 1 1 family members with microphthalmia 
or coloboma (gray symbols) or clinical anoph- 
thalmia (black symbols) and 9 obligate carriers 
(dotted symbols). 

(B) Anterior eye and fundus photographs of family 
members with iris or chorioretinal colobomas 
(VI-2, VII-2, and VII-3), microphthalmia (111-12 and 
VII-2), or bilateral clinical anophthalmia (VII-5) and 
orbital MRI views of VII-5. The T2w coronal MRI 
shows extraocular muscles (red arrowheads), but 
absent eye globes. The T2wFS axial image shows 
bilateral hyperintense orbital cysts (yellow arrow- 
heads); optic nerve head, onh; chorioretinal colo- 
boma, ore; left, L; and right, R. See also Table SI . 

(C) Genetic mapping of MAC disease. (Top) 
Multipoint LCD plot of autosomes, based on 
affected individuals and obligate carriers. (Bottom) 
Expanded linkage analysis favors chromosome 1 0 
localization. See also Figures SI and S2. 
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Here, we show mutations in the serum RBP gene underlie an 
autosomal dominant form of MAC that is transmitted with incom- 
plete penetrance and a unique maternal parent-of-origin effect 
(Sturtevant, 1923). We further show that the unliganded mutant 
RBPs bind STRA6 with much greater affinity than wild-type, 
and consequently are likely to disrupt delivery of vitamin A to 
target cells, consistent with a dominant-negative effect. These 
results shed light on the maternal-fetal nutritional interface, ge- 
netic susceptibility to vitamin A deficiency, and the etiology of 
eye malformations. 

RESULTS 

Autosomal Dominant MAC Disease with Reduced 
Penetrance and a Maternal Effect 

A seven-generation pedigree (Family 1) was identified through 
two probands with anophthalmia (Figure 1A). The pene- 
trance of eye disease is incomplete (P = 0.4), based on 54 
informative meioses (Figure SI A). Carriers have phenotypes 



ranging from normal to microphthalmia 
to complete absence of the eyes 
(Figure IB; Table SI). Several individuals 
have iris and/or chorioretinal colobomas. 
Transmission is skewed. Nearly all 
affected individuals (10 of 11) inherited 
the trait from their mother, such 
that maternal penetrance is signifi- 
cantly greater than paternal penetrance 
(Pmat = 0.7, Ppat = 0.1; Figure SIB). In 
the only instance of paternal transmis- 
sion, one of two monozygous twins 
(VI -2) was affected. 



A New MAC Locus on Chromosome 
10q23 

We first excluded 23 loci associated 
with MAC in humans or vertebrate 
models (Table S2) by comparing haplotypes of the two pro- 
bands. We then examined available family members and per- 
formed genome-wide multipoint linkage analysis (Figures SIC 
and SID). We applied a simple autosomal dominant (AD) 
model, scoring only affecteds and obligate carriers. This anal- 
ysis suggested three candidate regions: 1q41, 10q23, and 
19p13, with peak LCD scores >2 (Figure 1C). To rank these re- 
gions, we included at-risk unaffected family members and 
applied AD models with uniform (Pgiobai = 0-4) or sex-specific 
(^mat ~ 0.7, Ppat ~ 0 .1) penetrance. This indicated a chromo- 
some 10q23 localization with peak LCD score of 3.01 (Fig- 
ure 1C). The 8.2 megabases (Mb) nonrecombinant interval 
contains 81 genes (Figure S2). Given the importance of vitamin 
A in eye development (Warkany and Schraffenberger, 1946) 
and eye malformations associated with STRA6 and ALDH1A3 
mutations (T.G., C.M.C., A.S., T.B., and N.M. Ghiasvand, un- 
published data); we tested genes in the critical region with 
roles in vitamin A transport (RBP4) and RA metabolism 
(CYP26A1 and C7). 
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Figure 2. RBP4 Mutations in Three Indepen- 
dent Families with Congenital Eye Malfor- 
mations 

(A) Map of 9.4 kilobases RBP4 gene, with signal 
sequence (gray), mature protein (black) coding 
regions, and MAC mutations (red box). 

(B) Sequence chromatograms showing heterozy- 
gous missense mutations, with maternal trans- 
mission in each pedigree. 

(C) Primary structure of translated RBP with ala-to- 
thr substitutions (red) in the mature polypeptide 
(yellow bar) and two alleles associated with 
recessive nyctalopia (gray). Note that A73T and 
A75T in the primary translation product correspond 
to A55T and A57T following cleavage of the signal 
sequence (gray bar, SS); cyan coils, a-helix; blue 
arrows A-H; and p strands forming the p-barrel. 

(D) Ribbon diagrams showing positions of domi- 
nant (red) and recessive (gray) substitutions. There 
are eight anti-parallel strands (dark blue) that 
form the ligand pocket. There are three loops 
(green) at the calyx opening contact TTR. The 
N terminus is relatively unconstrained. See also 
Figure S3D. 

(E) Alignment showing conservation of alanines 55 
and 57 among vertebrates. 




Dominant RBP4 Mutations in Three Unrelated MAC 
Families 

RBP4 encodes serum RBP (Kanai et al., 1968) and contains six 
exons (Figure 2A). Exon screening revealed a missense mutation 
(c.223G>A, P.A75T) that cosegregated with the disease trait 
(Figure 2B) and was not found in >1 1 ,330 control chromosomes. 
We then screened a cohort of 75 unrelated MAC samples and 
discovered mutations in two cases, a male with bilateral anoph- 
thalmia and neurodevelopmental delay (Family 2), and a female 
with left microphthalmia and coloboma (Family 3). They share a 
single missense allele (c.217G>A, p.A73T) on distinct haplo- 
types, indicating recurrence of the mutation, with maternal trans- 
mission in both families (Figure S3). 



cavity (Figures 2D and S3) (Cowan et al., 
1990;Zanottiet al., 1993). Both mutations 
substitute threonine for alanine, in codons 
73 and 75 of p strand C (Figure 2C), corre- 
sponding to residues 55 and 57 in the 
mature polypeptide. These alanines face 
the ligand pocket (Figure 2D), contact car- 
bons C4 and C3 of the retinol p-ionone 
ring, respectively (Cowan et al., 1990), 
and are completely conserved among 
vertebrates (Figure 2E). 

Two previously reported RBP4 muta- 
tions, P.I59N and p.G93D, were associ- 
ated with recessive night blindness in compound heterozygous 
sisters (Biesalski et al., 1999). They correspond to 141 N 
and G75D in p strands B and D of the mature protein, after 
signal peptide cleavage. These residues also interact with 
side groups of the p-ionone ring, and biochemical data sug- 
gest G75D and 141 N proteins bind retinol poorly (Folli et al., 
2005). Molecular modeling shows that A55T and A57T 
proteins can accommodate retinol, under increased strain 
due to steric, hydrophilic, and H-bonding effects of the 
threonine side chain (Figure S3). To understand the allelic 
heterogeneity and pathogenic basis of MAC disease, we sys- 
tematically compared properties of wild-type (WT) and mutant 
RBPs. 
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P.A73T and p.A75T Alter the Retinol-Binding Surface 

RBP mobilizes aW-trans retinol from liver stores to target tissues, 
including the retinal pigment epithelium and placenta (D’Ambro- 
sio et al., 2011). As the archetypal lipocalin (Newcomer and Ong, 
2000), RBP folds as a p-barrel with a central hydrophobic ligand 



A55T and A57T Proteins Are Stably Secreted 

RBP is constitutively expressed by hepatocytes, retained in 
the ER and secreted into the bloodstream as holo RBP (Muto 
et al., 1 972; Soprano and Blaner, 1 994), stabilized by three disul- 
fide bonds (Selvaraj et al., 2008). We first evaluated how 
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missense mutations affect RBP synthesis and secretion in trans- 
fected HeLa cells (Meihus et al., 1 992) by western analysis, using 
an N-terminal hemagglutinin (HA) tag (Figure 3A). The size (21 kil- 
odalton [kDa]) and abundance of A55T and A57T proteins in 48 hr 
conditioned media (CM) were indistinguishable from WT (Fig- 
ure 3B). In contrast, G75D and I41N proteins migrated as 
42 kDa dimers, or larger multimers (141 N), linked by intermolec- 
ular disulfide bonds (Figures 3B and S4). We confirmed this 
result using glutaraldehyde cross-linked CM, and compared 
intracellular RBP levels using as an ER retention control 

(Figure 3C). Intracellular G75D and, to a greater extent, 141 N 
were elevated, suggesting a partial secretion defect, with no 
evidence of ER stress (Figure S4B). We conclude that A55T 
and A57T are secreted as stable 21 kDa monomers, whereas 
G75D and 141 N misfold in the ER, aggregate, and exhibit 
increased cellular retention. 

A55T and A57T Complex Normally with Transthyretin 

Under normal conditions, holo RBP and transthyretin (TTR), a 
60 kDa homotetramer (Heller and Horwitz, 1 974), are cosecreted 
in a 1-to-1 molar ratio as a 76 kDa complex (Kanai et al., 1968). 
The large size of this complex prevents renal filtration, allowing 
RBP to remain in circulation (Soprano and Blaner, 1 994; van Ben- 
nekum et al., 2001). In coimmunoprecipitation experiments (Fig- 
ures 3D and 3E), human TTR interacted strongly with WT, A55T, 
and A57T proteins, but poorly with G75D or 141 N. Similar results 
were obtained for bovine TTR, present in the media supplement. 
To quantitatively assess the RBP-TTR interaction, we performed 
reciprocal surface plasmon resonance (SPR) assays with purified 
TTR and recombinant RBP'^'^ or his-RBP (Figure 3F). WT holo 
RBP bound TTR with 2- to 3-fold greater affinity than apo RBP, 
giving mean steady state Kd values of 0.9 and 2.2 |iM respec- 
tively, similar to previous reports (Folli et al., 2010; Malpeli et al., 
1996). The affinity of A55T and A57T mutant RBPs was similar 
or slightly lower than WTin buffered saline (HBS). However, inclu- 
sion of nonionic surfactant (0.005% Tween) significantly reduced 
holo A55T affinity for TTR, presumably by removing retinol (p < 

0. 001 , unpaired t test, and df = 1 0, see below). 

The in vitro behavior of G75D and 141 N proteins is consistent 
with the absence of immunodetectable serum RBP in p.G93D/ 
p.l59N compound heterozygotes and reduction of RBP in the 
P.I59N/+ parent, in the setting of normal TTR levels (Biesalski 
et al., 1 999). Conversely, RBP and TTR levels in p.A75T/+ (Family 

1 , VI-2, VI-3, and VI-7) and p.A73T/+ (Family 3, 11-2) carriers were 
within normal range (Table S3). 

WT and A57T Proteins Coexist in p.A75T/+ Carrier 
Plasma 

To assess the ratio of allotypes in vivo, total RBP was purified 
from obligate carrier VI-2 plasma (Figure S5), digested with 
trypsin, and analyzed by mass spectrometry (Figure 4). The pre- 
dicted WT and A57T peptides encompassing amino acid 57 
differ by 30 Dalton. Consequently, we identified MALDI-TOF 
peaks in the 3,100 to 3,220 mass-to-charge ratio (m/z) range 
corresponding to WT and A57T tryptic peptides, with a 2-to-1 in- 
tensity ratio (Figure 4C). These were verified by tandem mass 
spectrometry (MS^) analysis (data not shown) and parallel MS 
of RBP'^'^ controls. Since the peptides ionize with equal effi- 



ciency (Figure 4D), we conclude that A57T constitutes one-third 
of circulating RBP in p.A75T/+ heterozygotes. 

Because both allotypes were present in carrier plasma, 
genomic imprinting is unlikely to explain the skewed transmis- 
sion of the MAC disease (Figure SIB). This conclusion is 
supported by RT-PCR analysis of FI mice, which showed com- 
parable levels of allelic Rbp4 mRNA transcripts in adult and fetal 
tissues (Figure S4C). 

In principle, the unequal ratio of allotypes could be explained 
by a difference in renal filtration. Under normal circumstances, 
RBP dissociates from TTR when retinol is delivered to tissues 
(Malpeli et al., 1996). Most of the resulting apo RBP is filtered 
and metabolized by the kidney, but trace amounts are detected 
in urine, at 1 % of serum levels (Raila et al., 2005) and are assumed 
to represent the RBP content of the glomerular ultrafiltrate pro- 
portionally. To test this hypothesis, we evaluated RBP allotypes 
in P.A75T/+ carrier urine by MS, but found no evidence for 
increased urinary elimination of A57T relative to WT (Figure S6). 

A55T and A57T Proteins Bind Vitamin A Poorly 

We tested retinol-binding properties of mutant RBPs using two 
assays, double radioisotope labeling and fluorescence enhance- 
ment. HeLa cells expressing WT or mutant RBP'^'^ were exposed 
to ^^S-met/cys and ^H-retinol, and the ^H/^^S ratio was deter- 
mined for RBP'^'^ immunopurified from CM (Figures 5A and 
5B). We observed a dramatic reduction in retinol binding, as pre- 
dicted by molecular modeling (Figure S3E). A55T bound negli- 
gible ^H-retinol, whereas A57T bound 16% of WT levels. G75D 
and 141 N mutants also bound very little vitamin A, as expected 
given their misfolded structures. RBP activity is evidently more 
sensitive to a threonine substitution at position 55 than 57, 
consistent with X-ray data placing retinol closer to Ala55 (3.6A) 
than Ala57 (4A) (Cowan et al., 1990). 

Retinol fluorescence intensity increases 15-fold when it oc- 
cupies the RBP ligand pocket (Cogan et al., 1976). Accordingly, 
we added 1 to 5,000 nanomolar (nM) retinol to apo RBP'^'^, 
purified under native conditions, and measured fluorescence 
(excitation [ex] 330 nanometer [nm] and emission [em] 460 nm) 
in PBS (Figure 5C, filled symbols). Surprisingly, A55T and A57T 
both bound retinol well in this assay, with affinities similar to 
WT (Kd ~80 nM). These results are consistent with SPR analysis 
of holo and apo forms interacting with TTR in HBS (Figure 3F), 
but differ sharply from the radioisotope data showing the mu- 
tants bind little or no vitamin A (Figure 5B). 

WT holo RBP is relatively resistant to temperature, pH extremes, 
and nonpolar solvents (Cogan et al., 1976; Raz et al., 1970), but 
sensitive to low ionic strength (Peterson, 1 971 ). Our disparate find- 
ings may be reconciled if A55T and A57T substitutions destabilize 
RBP contacts with retinol, particularly under adverse environ- 
mental conditions, increasing the probability that ligand is 
released to the solvent. Whereas the initial fluorescence assay 
was performed in PBS (Figure 5C, closed symbols), our ^H-retinol 
binding assay involved sequential washes in PBS containing 1 % 
Triton X-100 and 0.5% deoxycholate (Figure 5A). We therefore 
systematically tested retinol binding in nonpolar and amphipathic 
environments (Figure 5D), including a dispersion of phosphatidyl- 
choline (PC) vesicles (Figure 5C, open symbols), to more closely 
approach in vivo conditions. Within the ER, bloodstream, and 
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Figure 3. A55T and A57T Proteins Are Secreted as Stable RBP Monomers and Interact with TTR 

(A) Test of synthesis, secretion, and integrity. 

(B) HA western analysis of CM, electrophoresed under native or denaturing conditions, before or after crosslinking. 

(C) Western blot of cell lysates, with a-tubulin loading control. 

(D) RBP-TTR binding assay in tissue culture. Brown circles, TTR homotetramers and blue bars, RBP monomers. 

(E) Western blot of HA immunoprecipitates probed in sequence with TTR and HA antibodies. 

(F) SPR analysis of RBP-TTR binding in vitro. (Top) Sensorgrams show a TTR concentration series interacting with apo his-RBPs on a biotin capture chip. (Middle) 
Steady state isotherms for apo and holo RBR*^"^ binding to TTR. (Bottom) Histogram of Kd values. Error bars give the SEM for nonlinear regression. See also 
Figure S4. 
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Figure 4. MS of RBP Proteotypes in 
P.A75T/+ Carrier Plasma 

(A) Tryptic peptides encompassing residue 57. 
Modified peptides (asterisks) arise from aikyiation 
of methionine 53 (♦). 

(B) MALDi-TOF spectrum of RBP from controi 
human piasma, indicating the criticai m/z region 
(red box). The y axis (ions detected) reflects rela- 
tive intensity. 

(C) Expanded view of control (top) and carrier 
(bottom) spectra from 3,100 to 3,250 m/z. Single- 
ionization peaks corresponding to WT (red lines) 
and A57T (green lines) proteins are marked. 

(D) MALDI-TOF spectra for recombinant RBP'^^ 
The invariant 3,223.3 m/z peak (human keratin, 
a common contaminant) serves as an internal 
standard. See also Figures S5 and S6. 
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tissue interstitial space, RBP is continuously exposed to phospho- 
lipid membranes and lipoprotein particles (van Meer et al., 2008). 
Indeed, retinol-binding activity of the mutant proteins was hyper- 
sensitive to ethanol, detergents, and phospholipid vesicles, 
following an A55T > A57T > WT allelic series. Almost no retinol 
was bound by A55T in 0.1 % PC (Kd > 30 pM). 

Our in vitro data predict that RBP4 heterozygotes may have 
reduced circulating retinol. Indeed, three p.A75T/-i- obligate car- 
riers had fasting serum vitamin A levels below the lower normal 
limit (Table S3), ranging from 50%-60% of the reference mean, 
and plasma retinol fluorescence was reduced (Figure S5B). 

Increased Binding of A55T and A57T Proteins to the 
STRA6 Receptor 

STRA6, or stimulated by R A 6 (Bouillet et al., 1997), is the trans- 
membrane receptor for RBP that mediates cellular uptake of 



vitamin A (Kawaguchi et al., 2007). At 
target tissues, holo RBP binds STRA6 
extracellular loop 6 with high affinity (Ka- 
waguchi et al., 2008). Following transfer 
of vitamin A into cells, apo RBP dissoci- 
ates from the receptor, allowing a new 
holo RBP molecule to dock (Kawaguchi 
et al., 2007). 

To examine binding of A55T and A57T 
proteins to STRA6, we performed two 
sets of experiments. We first applied 
®®S-labeled apo WT, holo WT, A55T, or 
A57T RBP in parallel to human embryonic 
kidney (HEK) 293T cells transfected with 
STRAS"^^^ or control expression vectors 
and measured ^^S-RBP bound after one 
hr (Figures 6A-6C). In this assay, apo 
RBP had 3-fold lower steady state bind- 
ing than holo RBP. More dramatically, 
STRA6-I- cells bound 4 to 7 times more 
mutant apo RBP than WT holo RBP (p < 
0.002, unpaired t tests, and df = 4). These 
findings, and the mass spectroscopy 
data (Figure 4), suggest that competition 
may occur between mutant and WT RBP molecules at STRA6 re- 
ceptors in vivo. To explore this possibility, we mixed 8-250-fold 
excess unlabeled holo WT with ^^S-labeled RBP'^'^ in parallel as- 
says. In each case, the unlabeled WT competitor displaced 
much less mutant ^^S-RBP than expected if the binding affinities 
were equivalent. 

To characterize the STRA6-RBP interaction more precisely, 
we determined the binding affinity (KJ and rate constant for 
the approach to equilibrium of mutant and WT RBPs, using a 
sensitive ELISA method (Figure S7) to measure RBP'^'^ bound 
to cells and released into the media. These assays show that 
the mutant proteins have a 30-40-fold greater affinity for 
STRA6 than WT (Figure 6D), with Kd values of 1 .9 nM (A55T) 
and 1 .5 nM (A57T) compared to 59 nM (WT, p < 0.001 , unpaired 
t tests, and df = 6). In principle, two kinetic mechanisms 
can explain this striking result, which is central to disease 
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Figure 5. A55T and A57T Proteins Bind Retinol Poorly in a Mixed Aqueous-Lipid Environment 

(A) ^H-retinol binding assay. 

(B) (Left) ^H-retinol binding data normalized to WT. (Right) Autoradiogram showing secreted in CM. 

(C) (Left) In vitro retinol binding profiles for WT and mutant RBR*^"^, measured by fluorescence in PBS ±0.1% a-L-PC, with 380 nM protein. (Right) Histogram 
showing similar Kd values in PBS. 

(D) (Left) Normalized retinol binding curves in PBS with Oto 50% ethanol. (Right) Increased sensitivity of mutant RBPs in an amphipathic environment, measured 
by loss of retinol fluorescence after exposure to detergent micelles (1% Tx-100, 0.5% DOC) in PBS. Error bars show the SD (fluorescence plots) or SEM 
(histograms) for three parallel assays. 



pathogenesis— either the mutant RBP-STRA6 complex dissoci- 
ates more slowly or the mutant RBPs bind the receptor more 
rapidly. To distinguish these possibilities, we measured the 
release of RBP from STRA6+ and control cells at 25°C and 
37°C (Figure 6E) and calculated forward (kon) and reverse (koff) 
rate constants. As these data show, the major consequence of 
the mutations is to increase kon by 25-50 fold (p < 0.001 , un- 
paired t tests, df = 42), with no significant change in koff (Fig- 
ure 6F; Table S4). The pathogenic RBPs thus bind STRA6 with 
much higher affinity than WT, yet carry little or no vitamin A. 

DISCUSSION 

Here, we identify RBP4 mutations as the cause for autosomal 
dominant MAC with incomplete penetrance and skewed 
maternal transmission. These findings demonstrate a new 
mode of inheritance in mammals, whereby phenotypic expres- 
sion is governed by maternal genotype. Our conclusions are 
supported by linkage analysis, the discovery of independent 
alleles, evolutionary conservation, the established role of vitamin 
A in eye morphogenesis, and convergent biochemical, func- 
tional, modeling and clinical data which prove A55T and A57T 
proteins have impaired retinol binding, but resist renal filtration 
and interact strongly with STRA6. Together, these data provide 
a simple, but elegant mechanism for disease pathogenesis. 

A Unified Disease Model 

A55T and A57T RBPs act as dominant-negative proteins, most 
likely by blocking vitamin A delivery at the STRA6 receptor (Fig- 



ure 7A). Mutant and WT proteins coexist in plasma (Figure 4) 
and are therefore both secreted. Following translation, A55T 
and A57T proteins may transiently bind vitamin A in the hepato- 
cyte ER, but if so, are likely to lose a significant fraction of their 
retinol content in the amphipathic environments of the ER-Golgi 
compartment and bloodstream (Figure 5). They are otherwise 
stable and partner with TTR (Figure 3). At the target cell, mutant 
RBPs bind STRA6 receptors more avidly than WT (Figure 6), 
with faster association kinetics, increased affinity, and thus 
longer net occupancy, creating a molecular restriction point. 
Consequently, delivery of vitamin A from holo RBP should be 
disrupted. 

When the RBP4 mutation is transmitted by the mother, this 
bottleneck effect is iterated twice, first, at the placenta, involving 
maternal-derived RBP, and later at the developing eye 
primordia, involving fetal-derived RBP (Figure 7B). In this setting, 
retinol delivery to fetal tissues may be dramatically reduced — 
and penetrance of eye phenotypes increased — compared to 
paternal transmission of the same mutation, creating a maternal 
inheritance pattern that resembles genomic imprinting, but does 
not involve chromatin or DMA modification. This model is sup- 
ported by data showing that STRA6 is localized in the placenta 
and fetal eye (Bouillet et al., 1997; Kawaguchi et al., 2007) and 
that maternal RBP does not cross the placental barrier in mice 
(Quadro et al., 2004). Furthermore, RBP is expressed in extraem- 
bryonic tissues that directly participate in retinol transfer across 
the maternal-fetal interface, including the visceral yolk sac (Jo- 
hansson et al., 1997; Sapin et al., 1997; Soprano et al., 1986; 
Ward et al., 1997). Recently, STRA6 has been shown to mediate 
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Figure 6. A55T and A57T Proteins Bind the STRA6 Membrane Receptor with Greater Affinity Than WT 

(A) STRA6 radioligand binding assay. 

(B) STRA6'^^^ expression in HEK293T cultures. (Left) Fluorescence micrographs of transfected cells immunostained for myc (green) with nuclear counterstain 
(blue). Scale bars, 40 |am. (Right) Western blot simultaneously probed with antibodies to human STRA6 (72 kDa) and a-tubulin (50 kDa). 

(C) Histogram showing binding of 5 pM ^^S-labeled WT, A55T, or A57T RBR*^"^ proteins in the absence (black) or presence (gray) of 8- or 250-fold excess un- 
labeled (cold) holo WT competitor. 

(D) Quantitative equilibrium analysis of RBP-STRA6 interaction by immunoassay. (Left) Binding isotherms and reciprocal plots of apo RBP'^'^ ELISA data. Relative 
RBP levels are given in counts per second (ops) emitted light, after subtracting nonspecific binding to control cells. (Right) Histogram of Kd values. The mutant 
RBPs bind to the receptor with 30-40 fold greater affinity than WT. 

(E) Kinetic analysis of the RBP-STRA6 interaction. Release of bound apo RBP'^'^ to the media over time, from saturated STRA6+ cells at 25°C. 

(F) Histogram comparing STRA6 association (kon) and dissociation (koff) rate constants calculated from binding data. The A55T and A57T mutations greatly 
increase the on rate for RBP binding to STRA6. See also Figure S7; Table S4. Error bars show the SEM for three parallel assays or nonlinear regression of triplicate 
measurements. 



retinol efflux from cells as well as influx, loading extracellular apo 
RBP with cytoplasmic vitamin A (Kawaguchi et al., 2012). This 
bidirectional mode may be critical during early development, 
as fetal RBP originating from the visceral yolk sac or liver can, 
in principle, ferry retinol stepwise between different STRA6-1- 
cells. Because the mutant RBPs are predicted to disrupt 
STRA6 docking on both sides, this relay mechanism may be 
highly sensitive to dominant-negative effects. The labyrinthine 
zone of the murine chorioallantoic placenta, for example, is a 
major site of maternal-fetal exchange that strongly expresses 
STRA6, but not RBP (Bouillet et al., 1997; Johansson et al., 
1997). Likewise in humans, RBP is expressed by the maternal 
decidua, but not by villous trophoblasts (Johansson et al., 1 999). 

When transmitted by the father, the RBP4 mutation can only 
disrupt vitamin A transfer beyond the placenta. Consequently, 
the severity of fetal VAD, and the genetic penetrance from males, 
should be comparatively low. Clinical phenotypes may only be 
expressed when vitamin A supplied to the placenta is dimin- 
ished, notably in twin gestation (individual VI-2), where retinol 
input is divided between two embryos. 



Structural Basis for Enhanced STRA6 Binding 

RBP is the archetypal lipocalin, an ancient protein family repre- 
sented in nearly all life forms, including mammals, invertebrates, 
fungi, and eubacteriae (Flower, 1996; Newcomer and Ong, 
2000). Its ligand pocket is formed by eight anti-parallel beta 
strands (A-H) with alternating hydrophilic and hydrophobic 
amino acids, the latter stabilizing retinol. The orientation of the 
A-B loop, specifically G34-L35-F36-L37, is the only major struc- 
tural difference between apo and holo RBP crystals at neutral pFI 
(Zanotti et al., 1993). Threonine substitutions at Ala55 or Ala57, 
conserved sites deep within the pocket, impair retinol binding 
and, paradoxically, enhance STRA6 binding. Because these 
sites are located in the interior of the protein and thus unlikely 
to contact STRA6, the mutations must increase receptor binding 
indirectly, by altering RBP conformation. The striking decrease in 
Kd is driven by a large increase in the association rate constant 
(kon) with no apparent change in dissociation kinetics (koff). While 
relatively unusual (Anderson et al., 1998), a small number of pro- 
tein-receptor affinity mutations are known to specifically affect 
kon (Lahti et al., 201 1 ; Lengyel et al., 2007). Our findings strongly 
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Figure 7. Model for Disease Pathogenesis, 
Dominant Inheritance, and Maternal Effect 
on Penetrance 

(A) RBP life cycle in WT (top) and heterozygous 
(bottom) individuals. In mutation carriers, A55T or 
A57T are cosecreted with WT proteins from the 
liver and/or extraembryonic tissues (yolk sac). 
Each RBP circulates in the maternal or fetal 
bloodstream in a stable complex bound to TTR, 
but most of the mutant proteins lack retinol. Upon 
reaching target tissues, the mutant RBPs bind 
STRA6 receptors with much higher affinity than 
WT, acting as dominant-negative particles that 
block vitamin A delivery. 

(B) Basis for maternal inheritance. Skewed pene- 
trance arises from functional “bottlenecks” that 
occur at sequential RBP-STRA6 interaction sites 
in the placenta and fetal eye. Disruption of vitamin 
A transfer at both levels, coupled with low maternal 
dietary retinoids (orange and red lines), predispose 
the fetus to MAC disease when the trait is mater- 
nally transmitted. Vitamin A deficiency, VAD. 
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suggest that RBP-STRA 6 docking involves a conformational 
adaptation of RBP and that this initial step, rather than diffusion, 
limits the binding reaction, consistent with a selected-fit model 
(Weikl and von Deuster, 2009). Complementary changes in 
STRA 6 folding may further stabilize the ligand-receptor complex. 

The RBP lipocalin undergoes reversible transformation to a 
molten globule state as pH or solvent polarity is reduced (Calder- 
one et al., 2003; Greene et al., 2006). This cooperative unfolding 
has been proposed to occur naturally in the local acidic environ- 



ment at the cell surface, favoring retinol 
release (Bychkova et al., 1998), and 
may be potentiated by interaction with 
STRA 6 . We propose that the A55T and 
A57T mutations, by altering the shape, 
polarity, and hydrophilicity of the retinol 
pocket, lower the activation energy for 
this transition. Consequently, a significant 
fraction of the mutant RBP population 
may exist in a partially melted state under 
normal physiological conditions. These 
molecules, which may resemble WT in- 
termediates in the RBP-STRA 6 binding 
reaction, presumably account for the 
enhanced retinol release observed in the 
presence of organic solvents, surfac- 
tants, or phospholipid vesicles. Indeed, 
retinol dissociates from the mutant 
RBPs with biphasic kinetics, in PBS 
following addition of Tx-100 and deoxy- 
cholate (DOC), consistent with the exis- 
tence of >2 discrete holo conformations 
(Figure 5D). 

Despite their increased forward reac- 
tion rates, the mutant apo RBPs appear 
to undock normally from STRA 6 (Fig- 
ure 6 E). Likewise, mutant apo and holo 
RBPs bind TTR with an intrinsic affinity similar to WT (Figure 3) 
and are thus retained in carrier plasma (Figure 4). Indeed, the 
enhanced binding of mutant RBP to STRA 6 - 1 - cell surfaces may 
in part explain the 1 :2 ratio of mutant-to-WT protein in carrier 
plasma, which cannot be accounted for by unequal urinary 
loss (Figure S 6 ). RBP normally contacts TTR via three external 
loops (Figure 2D), which form the opening to the retinol pocket, 
and the C terminus (Newcomer and Ong, 2000). Although the 
structural details are not known, these same features are likely 
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to mediate the interaction between RBP and STRA6, allowing 
retinol to exit (Kawaguchi et al., 2008). For steric reasons, the 
RBP-TTR complex must dissociate before receptor binding 
can occur. This step is driven by a 20-fold difference in the 
steady state affinity of RBP for STRA6 versus TTR. The alanine 
substitutions affect RBP binding to former, but not the latter. 
Structural studies of the mutant proteins may shed light on the 
conformational steps necessary for STRA6 docking and dissoci- 
ation and vitamin A release. 

RBP4 Mutations, Diet, and Vitamin A Physiology 

Among organs, the eye is most frequently affected in animal 
models of vitamin A deficiency (Hale, 1933; See and Clagett- 
Dame, 2009; Warkany and Schraffenberger, 1946; Wilson 
et al., 1953). Our findings are consistent with this pattern. Despite 
a global reduction in vitamin A available to the embryo, pheno- 
types in Family 1 are limited to the eye. Given the central role 
of retinoids in light perception, this unique sensitivity is striking 
and may reflect an evolutionary origin of RA signaling in the visual 
system (Campo-Paysaa et al., 2008; Drager et al., 2000). Like- 
wise, in humans, total loss of RBP4 is only associated with night 
blindness, retinal dystrophy, and chorioretinal coloboma (Biesal- 
ski et al., 1999; Cukras et al., 2012). 

In addition to retinol, other forms of vitamin A (principally retinyl 
esters) are delivered to the placenta via chylomicron lipoprotein 
particles (D’Ambrosio et al., 2011; Wassef and Quadro, 2011). 
Indeed, 25% of postprandial retinoids, including retinyl esters 
(RE) and a/p-carotenoids, travel directly to extrahepatic tissues 
from intestinal enterocytes via this parallel system, with no 
involvement of RBP (Goodman et al., 1965). However, because 
chylomicron RE are rapidly cleared (Berr, 1992), RBP accounts 
for 95%-99% of circulating retinoids in the fasting state (So- 
prano and Blaner, 1994). Accordingly, the extent and timing of 
maternal RE consumption during pregnancy, along with other 
genetic and/or environmental modifiers, may account for the 
variable penetrance. For women carrying an RBP4 mutation, 
careful dietary supplementation with extra vitamin A (RE) in 
divided doses may be indicated to minimize risk of congenital 
eye malformations in offspring. 

Recent nutritional studies showed that Rbp4 -/- mouse pups 
born from Rbp4 -/- dams are normal when mothers were fed 
diets replete with RE (Quadro et al., 2005). However, these 
pups developed microphthalmia or anophthalmia in the absence 
of dietary retinoids and the severity was determined by maternal 
vitamin A status. In our study, clinical phenotypes were roughly 
correlated with the magnitude of biochemical effects (Figure 5). 
Thus, both affected males in Family 2 (A55T) had neurodevelop- 
mental delay in addition to anophthalmia. The discovery of genes 
that modify plasma retinoid levels, apart from RBP4 and TTR 
(Mondul et al., 2011), may shed more light on this disease. 

Nutritional Mechanism for Maternal Inheritance of 
Human Genetic Disease 

Maternally skewed inheritance has been reported for other birth 
defects, including congenital heart disease (Burn et al., 1998; 
Nora and Nora, 1987), but the molecular basis is unknown. A 
study of scoliosis identified gestational hypoxia as an environ- 
mental factor that disrupts fibroblast growth factor signaling 



and somitogenesis, increasing penetrance of Notch pathway de- 
fects (Sparrow et al., 2012). Genetic vitamin A deficiency has 
been previously suggested as a potential factor for eye malfor- 
mations (Hornby et al., 2003). Dominant-negative RBP4 alleles 
provide a further example of gene x environment effects. Unlike 
other modes of maternal inheritance, e.g., transmission of 
ooplasmic mRNA, mitochondrial DNA mutations, or genomic 
imprinting, these alleles affect fetal and maternal metabolism at 
a functional level. The sex-specific penetrance has a physiolog- 
ical basis. Our findings highlight the importance of maternal-fetal 
nutrition and may apply broadly to congenital disease. 

EXPERIMENTAL PROCEDURES 

Clinical Data 

Human studies were approved by the University of Michigan (UM); University 
of Caiifornia, Davis; and Einstein Medicai Center institutionai Review Boards, 
and informed consent was obtained from aii subjects. Eye exams, fundus 
photography, and magnetic resonance imaging (MRi) were performed at UM 
(Tabie S1). Biood tests for retinoi, RBP, and TTR (preaibumin) were performed 
on carrier sampies coiiected after a 12 hr fast (Tabie S3). MS of piasma and 
urine sampies and other ciinicai studies are detaiied in the Extended Experi- 
mental Procedures. 

Genetic Analysis 

Family 1 genotypes were determined at 51 simple sequence length polymor- 
phism (SSLP) and 6,070 SNP loci using blood, saliva, or buccal DNA. SNPs 
were assessed using the HL12 BeadChip platform and BeadStudio software 
(lllumina). Genetic mapping was performed in three steps. Exclusion tests 
were performed by comparing probands using SSLP markers flanking 23 
candidate loci (Table S2). Multipoint linkage analysis was then performed on 
a core pedigree consisting of all living affected individuals, obligate carriers, 
and spouses (n = 20) using Merlin v1 .1 .2 (Abecasis et al., 2002). Finally, linkage 
analysis was extended to include all collected (n = 33; Figures SIC and SID) 
and nodal family members. LCD scores from two subpedigrees (Figure SID) 
were summed, discarding duplicate phenotypic information (Bellenguez 
et al., 2009) and applying an AD inheritance model with uniform or sex-specific 
penetrance, estimated from the pedigree. 

To identify f?SP4 coding variants, we screened 75 unrelated MAC probands 
and 307 controls (National Institute of Neurological Disorders and Stroke 
panel) by PCR Sanger sequencing (Table S5) and queried the EVS Exome 
Variant database. Chromosome lOq haplotypes of Families 2 and 3 were 
compared using the Omnil-Quad SNP platform (lllumina). 

RBP Secretion and TTR Interaction Assays 

Parallel HeLa cultures were transfected with pUS2-RBP'^'^ vectors expressing 
WT, mutant (A55T, A57T, G73D, or 141 N), or ER retention (WT^°^4 human RBP 
proteins with an N-terminal HA epitope, or control plasmid (Table S6). After 
48 hr, CM and cell lysates were electrophoresed through native or denaturing 
polyacrylamide gels and compared by HA western analysis. To evaluate RBP 
multimerization, CM was crosslinked in 0.5% volume per volume (v/v) glutar- 
aldehyde for 30 min, boiled with or without 2 mM 2-mercaptoethanol ((3ME), 
and electrophoresed. To assess RBP-TTR binding in culture, HeLa cells 
were cotransfected with pUS2-TTR’^^'^ and WT or mutant pUS2-RBP'^^ plas- 
mids. Secreted RBR*^"^ complexes were immunopurified from CM with anti-HA 
agarose beads (Sigma), washed in PBS, and tested for TTR by western anal- 
ysis. To fully assess RBP-TTR binding in vitro, reciprocal SPR assays were per- 
formed using a Biacore T100 system (GE Healthcare) and biotin capture chip 
with human plasma TTR (Sigma) and HA- or polyhistidine-tagged RBP purified 
from HeLa CM. Molecular cloning, cell culture, protein biochemistry, SPR, and 
data analysis are detailed in the Extended Experimental Procedures. 

Retinol Binding Assays 

To assess retinol binding to RBP in culture, HeLa cells were transfected with 
WT or mutant pUS2-RBP'^'^ plasmids in Dulbecco’s Modified Eagle media 
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(DMEM) containing 10% deiipidated fetai bovine serum (FBS), metaboiicaiiy 
labeled with 6 microcuries per milliliter (^iCi/ml) ^^S-methionine and -cysteine 
in serum-free DMEM for 1 hr, and exposed to 8.25 |aCi/ml ^H-retinol (NEN Per- 
kin-Elmer) for an additional 3 hr. Secretion of ^®S-labeled RBP in the CM was 
assessed by gel electrophoresis and autoradiography. Radiolabeled RBP*^^ 
was immunopurified from CM using anti-HA agarose beads, washed three 
times in PBS containing 1% Triton X-100, 0.5% sodium DOC, and eluted in 
2% sodium dodecyl sulfate (SDS). The ^H/^^S isotope ratio was measured 
by liquid scintillation counting (LSC) and normalized to WT. 

For in vitro titration assays, recombinant apo RBP'^'^ was immunopurified 
from serum-free HeLa CM, eluted with HA peptide (Anaspec), and dialyzed 
into PBS. Homogeneity was verified by gel electrophoresis. Equal amounts 
of WT, A55T, or A57T RBP'^'^ proteins were loaded with 0 to 5 laM fresh aW-trans 
retinol for 1 hr in PBS. Binding was quantified by retinol fluorescence (ex 
330 nm and em 460 nm) enhancement (Cogan et al., 1976) using a microplate 
reader. To assess binding in nonpolar or amphipathic conditions, parallel as- 
says were performed in PBS with 0 to 50% ethanol; 1% Triton X-100, 0.5% 
DOC for 0 to 90 min; or 0.1 % a-L-PC vesicles dispersed in 5% n-butanol. 

STRA6-RBP Binding 
Radioligand Assay 

Immunopurified ^^S-labeled apo WT, holo WT, A55T, or A57T RBP*^^ (5 pico- 
molar [pM] at 1.2 x 10^ counts per minute/picomole specific activity) was 
added, with or without an 8- or 250-fold excess unlabeled holo WT competitor, 
to paired HEK293T cultures, transfected with pUS2-STRA6'^^‘^ or pUS2 vector 
plasmid DMA. After 1 hr at 37°C, the cells were gently washed with prewarmed 
PBS (Kawaguchi et al., 2007) and the amount of bound ^^S was determined by 
LSC. Receptor-specific binding was calculated by subtracting the vector con- 
trol. STRAO'^y^ expression was verified by myc immunofluorescence and 
STRA6 western analysis. 

Equilibrium and Kinetic Analysis 

STRA6+ or control HEK293T cells were plated on poly-D-lysine (PDL) coated 
dishes and incubated with CM containing 0-80 |ig/ml A55T, A57T, or WT apo 
RBR*^"^ and 0.5% BSA for 90 min at 37°C. For Kd analysis, monolayers were 
washed with ice-cold Hanks balanced salt solution (HBSS) and bound RBP 
was eluted with 25 mM glycine HBSS pH 3. For kinetic analysis, monolayers 
were washed with DMEM 0.5% BSA and dissociation was followed at 25°C 
or 37°C by sampling the media at time points from 0 to 90 min. The concentra- 
tion of RBP'^'^ was determined by an ELISA (Figure S7). The immunoassay, 
saturation binding and kinetic methods, and quantitative analysis are detailed 
in the Extended Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and six tables and can be found with this article online at http://dx.doi. 
org/1 0. 1 01 6/j.cell.201 5.03.006. 
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SUMMARY 

How disease-associated mutations impair protein 
activities in the context of biologicai networks remains 
mostiy undetermined. Although a few renowned ai- 
leies are weii characterized, functionai information is 
missing for over 1 00,000 disease-associated variants. 
Here we functionally profile several thousand mis- 
sense mutations across a spectrum of Mendeiian 
disorders using various interaction assays. The ma- 
jority of disease-associated aileies exhibit wiid-type 
chaperone binding profiies, suggesting they preserve 
protein folding or stability. While common variants 
from heaithy individuals rarely affect interactions, 
two-thirds of disease-associated aileies perturb pro- 
tein-protein interactions, with half corresponding to 

CrossMark 



“edgetic” alleles affecting only a subset of interac- 
tions while leaving most other interactions unper- 
turbed. With transcription factors, many alleles that 
leave protein-protein interactions intact affect DNA 
binding. Different mutations in the same gene leading 
to different interaction profiles often result in distinct 
disease phenotypes. Thus disease-associated alleles 
that perturb distinct protein activities rather than 
grossly affecting folding and stability are relatively 
widespread. 

INTRODUCTION 

Over a hundred thousand genetic variants have been identified 
across a large number of Mendeiian disorders (Amberger et al., 
2011), complex traits (Hindorff et al., 2009), and cancer types 
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Figure 1. Systematic Characterization of 
Human Disease Missense Mutations 

(A) Two possible effects of missense disease 
mutations: protein folding/stability changes and 
molecular interaction perturbations. 

(B) Understanding mutational effects by edgo- 
typing links genotype to phenotype. Solid and 
dashed lines represent retained and perturbed 
interactions, respectively. 

(C) Experimental pipeline for characterizing 
alterations of molecular interactions, including 
protein-chaperone (PCI), protein-protein (PPI) and 
protein-DNA (PDI) interactions. WT: Wild-type, 
Mut: mutation. TF: transcription factor. “1,” de- 
tected PPI; “0,” not detected PPI. Dashed oval: 
variants in the same gene. See also Figure S1 . 
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(Chin et al., 2011). However, many fundamental questions re- 
garding genotype-phenotype relationships remain unresolved 
(Vidal et al., 2011). One critical challenge is to distinguish causal 
disease mutations from non-pathogenic polymorphisms. Even 
when causal mutations are identified, the functional conse- 
quence of such mutations is often elusive (Sahni et al., 2013). 

Genotypic information alone rarely elucidates the mechanistic 
insights pertaining to disease pathogenesis. Although genotype- 
phenotype relationships can be modeled under the assumption 
that most disease-associated mutations lead to complete loss of 
protein function, e.g., through radical changes such as protein 
misfolding and instability (Subramanian and Kumar, 2006) (Fig- 
ure 1A), the reality is often more complex, as in the case of 
mutations affecting the same gene but giving rise to clinically 



distinguishable diseases (Zhong et al., 
2009). In addition, since genes and gene 
products do not function in isolation but 
interact with each other in the context 
of interactome networks (Vidal et al., 
201 1 ), it is likely that many diseases result 
from perturbations of such complex net- 
works (Goh et al., 2007). 

Missense mutations are among the 
most common sequence alterations in 
Mendelian disorders, accounting for 
more than half of all reported mutations 
in the Human Gene Mutation Database 
(HGMD) (Stenson et al., 2014). In princi- 
ple, missense mutations may have no 
functional consequences, disrupt the 
three-dimensional structure of the corre- 
sponding protein, or exert specific effects 
on particular molecular or biochemical 
interactions (Figure 1A), such as protein- 
protein interactions (PPIs), protein-DNA 
interactions (PDIs), or enzyme-substrate 
interactions, while leaving all other func- 
tional properties unperturbed. We previ- 
ously reported that a considerable por- 
tion of Mendelian disease mutations 
could indeed be predicted computation- 
ally to cause interaction-specific, or “edgetic,” perturbations 
(Zhong et al., 2009). However, only a small number of genes 
and associated mutations were experimentally tested in that 
study, and the extent to which disease mutations globally lead 
to interaction perturbations remains to be determined. 

Here we describe a multi-pronged approach to systematically 
decipher molecular interaction perturbations associated with 
missense mutations. Since chaperones and associated quality 
control factors (QCFs) can salvage unstable proteins by assisting 
with folding, and an increase in protein-chaperone interactions 
(PCIs) has been observed for a number of disease mutants 
(Whitesell and Lindquist, 2005), our systematic approach begins 
with characterizing PCIs for large numbers of disease-associ- 
ated alleles, followed by systematic measurements of PPI and 
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PDI profile changes caused by mutations, a strategy referred to 
as “edgotyping” (Figure 1 B). 

We provide evidence for widespread interaction perturbations 
across a broad spectrum of human Mendelian disorders. Our 
results suggest that interaction profiling helps distinguish dis- 
ease-causing mutations from common variants. Furthermore, 
the integration of different types of molecular interactions ex- 
pands our ability to understand complex genotype-phenotype 
relationships. 

RESULTS 

Human Mutation ORFeome Version 1.1 

To globally characterize disease-associated alleles, we selected 
mutations associated with a wide range of disorders, including 
cancer susceptibility and heart, respiratory, and neurological 
diseases. We retrieved from HGMD (Stenson et al., 2014) a list 
of ~1 6,400 mutations affecting over 1,200 genes for which we 
have a wild-type (WT) open-reading frame (ORF) clone in our hu- 
man “ORFeome” collection (Yang et al., 2011) and selected up 
to four mutations per gene (Figure 1C; Tables SI A and SIB; 
Extended Experimental Procedures). Using properties related to 
RNA abundance, GO annotation, and protein domains (Extended 
Experimental Procedures), we verified there is no significant bias 
between our selected genes and the rest of the human genome 
or all genes represented in FIGMD (Figures S1B-S1G). 

Altogether, we cloned and sequence-verified 2,890 human 
mutant ORFs (hmORFs), each harboring a single nucleotide 
change that results in an amino acid change relative to the cor- 
responding WT ORF of 1 ,140 genes. To our knowledge, this hu- 
man mutation ORFeome version 1 .1 resource (hmORFeomel .1 ; 
Figure SI A) is the most extensive human mutation collection re- 
ported to date. 

Disease Mutations and Protein Folding and Stability 

Using enhanced binding to a chaperone as an indicator of pro- 
tein instability or misfolding, we examined how disease muta- 
tions impact protein folding and disposition. We determined 
the extent to which hmORF-encoded proteins and their WT 
counterparts interact with QCFs using a quantitative high- 
throughput LUMIER assay (Taipale et al., 2012; Taipale et al., 

2014) (Figure 1C and Table S2A). We selected the following 
QCFs based on their broad specificity (Taipale et al., 2014): 
(1) the cytoplasmic chaperones FISP90 and FISC70, (2) their 
co-chaperones BAG2 and CFIIP/STUB1, (3) the proteasomal 
regulatory subunit PSMD2 (formerly known as RPN1), and (4) 
the ER chaperones GRP78/BIP and GRP94 (Extended Experi- 
mental Procedures). We did not survey mitochondrial chaper- 
ones since only ~7% of disease-associated gene products 
are predicted to localize solely in mitochondria (Huntley et al., 

2015) . 

Increased interaction between a OOF and mutant or WT 
protein, as measured by the LUMIER assay, indicates a muta- 
tion-induced perturbation in conformational stability, often asso- 
ciated with compromised or complete loss of function (Taipale 
et al., 2012). The interaction profiles of most mutant proteins 
correlated with their WT counterparts. However, compared to 
a background control set, we observed a significant enrichment 



of mutant alleles showing increased interaction with QCFs (Fig- 
ures 2A-2H and S2A) but little or no enrichment for decreased 
interaction (Figures 2A and S2B; Extended Experimental Proce- 
dures). The interaction profiles of mutant proteins with the 
different cytoplasmic QCFs were highly correlated, distinct 
from those with ER factors (Figure 21). These results highlight 
the coordination and specificity of cellular quality control path- 
ways. Altogether ~28% of the tested alleles exhibited increased 
binding to at least one of the seven QCFs tested. Although this 
fraction is likely a conservatively low estimate due to limited 
assay sensitivity, the strong correlation between chaperone 
interaction profiles (Figure 21) suggests that the estimate would 
not increase substantially by assaying more chaperones. We 
validated several mutant-specific interactions with endogenous 
chaperones by co-immunoprecipitation followed by western 
blot, corroborating the results obtained with the LUMIER assay 
(Figure 2J). 

We next estimated protein abundance using semiquantitative 
ELISA, which provides a proxy for steady-state protein stability. 
Although the expression levels of mutant alleles correlated with 
their WT counterparts (Figure S2C), mutant proteins exhibiting 
enhanced interactions with cytoplasmic, but not ER, chaperones 
were detected at lower steady-state levels than their WT coun- 
terparts (p < 1.0 X 10“"^; Figure 3A). This is possibly a result of 
retention in the ER of mutant proteins that would normally be 
secreted and therefore not be detected by an assay that cap- 
tures intracellular proteins. Interestingly, recessive alleles ex- 
hibited lower protein abundance levels and increased binding 
with QCFs compared with proteins encoded by dominant alleles 
(Figures S2D and S2E). This is consistent with the hypothesis 
that recessive mutations are more likely to result in loss-of-func- 
tion phenotypes than dominant mutations (Lesage and Brice, 
2009). 

To gain insight into the structural properties of mutant proteins 
that exhibit increased binding to QCFs, we assessed the impact 
of different disease mutations on predicted protein structures. 
The disease alleles associated with increased binding to QCFs 
corresponded significantly more often to mutations of residues 
buried in the core of the protein (Figure 3B and Table SIC), 
and less often to mutations in intrinsically disordered regions 
(Figure 3C) when compared to mutant proteins with no change 
in binding. Next, we estimated the relative “deleteriousness” 
associated with distinct genetic mutations using PolyPhen-2 al- 
gorithm (Adzhubei et al., 2010). Deleterious mutations predicted 
by PolyPhen were significantly enriched in alleles that exhibited 
increased binding to QCFs (Figure S2F). 

Previous studies suggested that increased chaperone binding 
reflects a change in protein stability (Falsone et al., 2004; Taipale 
et al., 2012). To provide further evidence for this, we assessed 
protein stability in cellular lysates by measuring solubility in a 
cellular thermal shift assay (CeTSA). We found that the majority 
(5 of 6) of mutant proteins with increased chaperone binding 
also exhibited decreased stability as measured by CeTSA (Fig- 
ures S3A-S3D). In addition, computational predictions by the 
FoldX program (Schymkowitz et al., 2005) suggest that mutant 
proteins with increased binding to QCFs are likely to be signifi- 
cantly less stable than their WT counterpart (Figure 3D and Table 
S2B). Taken together, experimental and computational analyses 
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Figure 2. Most Disease Missense Mutations Do Not Impair Protein Folding or Stability 

(A) Differential Z score distributions in LUMIER assay. Normalized differential Z scores are calculated as the difference in chaperone binding between all mutant/ 
WT pairs expressed at detectable levels (n = 12,131). Non-expressed pairs serve as controls (n = 1,567). 

(legend continued on next page) 
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Figure 3. Mutant Proteins with Enhanced 
Binding to QCFs Are Likely to Be Unstable 

(A) Protein expression ieveis measured by ELiSA. 
X axis shows aii tested mutants (Aii), mutants with 
no change (non-binding) or an increase in binding 
to QCFs. 

(B) Soivent accessibiiity of mutant proteins. 

(C) Disorder anaiysis of mutant proteins. 

(D) Stabiiity predictions by FoidX. AAG, free en- 
ergy change. 

Dashed iine (A and D) represents the median of aii 
mutants, p vaiues (A) and (D) by one-sided Wii- 
coxon rank sum test; (B) and (C) by Chi-square 
test. For n vaiues, see Tabie S7B. *p < 0.05; **p < 
0.01 ; ***p < 0.001 . See aiso Figures S2 and S3. 



and in our human interactome map 
HI-ll-14 (Rolland et al., 2014) (Figure 1C). 
Altogether, we obtained interaction pro- 
files for 460 mutant proteins and their 
220 WT counterparts and found 521 per- 
turbed interactions out of 1,316 PPIs 
(Table S3A). 



suggest that mutant proteins with enhanced binding to QCFs 
have a destabilized protein structure. 

Our quantitative survey of allele-specific interactions esti- 
mates that the majority of missense disease mutations do not 
dramatically impact protein structure or folding (Tables SID 
and S2). Therefore, they may exert their deleterious effects 
through other mechanisms such as perturbation of molecular 
interactions. 

Disease Mutations and PPI Perturbations 

In principle, the effects of missense disease mutations on molec- 
ular interactions (Zhong et al., 2009), or “edgotype” (Sahni et al., 
2013), could range from no apparent detectable change in 
interactions (“quasi-WT”), to specific loss of some interaction(s) 
(“edgetic”), to an apparent complete loss of interactions (“quasi- 
null”) (Figure 4A). To systematically characterize PPI perturba- 
tions associated with disease mutations and identify potential 
gain of interactions, we used the yeast two-hybrid (Y2H) interac- 
tion assay followed by a stringent validation assay. After autoac- 
tivator removal, we screened 2,449 mutant proteins and their 
1 ,072 corresponding WT proteins for interactions with proteins 
encoded by the ~7,200 ORFs in the human ORFeome vl.1 
(Rual et al., 2004). Mutant and WT proteins were then tested 
pair-wise against all partners found both in these Y2H screens 



To validate these results, we used the 
orthogonal in vivo Gaussia princeps lucif- 
erase protein complementation assay (GPCA) performed in hu- 
man 293T cells (Cassonnet et al., 2011) (Table S3B). Unperturbed 
interactions were recovered at a rate statistically indistinguishable 
from that of a well-documented positive reference set (PRS), 
similar to the interactions of the WT alleles (Braun et al., 2009; Ven- 
katesan et al., 2009). Perturbed interactions were recovered at a 
rate as low as a negative control “random reference set” (RRS) 
(Figures 4B and S4A), demonstrating the high quality of the identi- 
fied perturbations induced by disease mutations. 

To analyze global and topological characteristics of gene 
products with edgetic, quasi-null, or quasi-WT mutations, we 
used the human interactome map HI-ll-14 (Rolland et al., 
2014). According to the studied network properties (between- 
ness, k-core centrality, degree, closeness), the nodes (genes) 
examined in our edgotyping study appear unbiased, in that their 
topological properties are statistically indistinguishable from 
other genes in the network (Figures S4B-S4F). Interestingly, 
we found that the genes carrying edgetic mutations tend to be 
more central than either non-edgetic genes or the rest of the 
network (Table S4). 

Out of a total of 197 mutations, corresponding to 89 WT pro- 
teins with two or more interaction partners, our interaction 
profiling identified 26% as quasi-null alleles, 31% edgetic and 
43% quasi-WT (Figure 4C and Table S3C). We also analyzed 



(B-H) Interaction scatter plots for 2,332 disease alleles. Alleles were assayed for interaction with QCFs HSP90 (B), HSC70 (C), BAG2 (D), CHIP (E), PMSD2 (F), 
GRP78 (G), and GRP94 (H). EGFR L858R and v-Src can interact with HSP90 (Shimamura et al., 2005; Taipale et al., 2012), and TTR D18G and ELANE G181 V can 
interact with GRP78 (Kollner et al., 2006; Sorgjerd et al., 2006); hence used as controls. Filled circles with black border represent significantly increased 
chaperone binding. Correlations by Pearson coefficient of determination, R^. 

(I) Clustering analysis based on chaperone interaction profile similarity. 

(J) Validation by co-immunoprecipitation (co-IP). LUMIER scores are shown below the blots. 

See also Figure S2. 
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Figure 4. Interaction Perturbation Profiles Distinguish Disease Mutations from Non-Disease Variants 

(A) Three classes of PPI profiles (edgotypes) for mutations. 

(B) Percentage of protein pairs recovered in GPCA at increasing score thresholds. Shading indicates SE of the proportion. 

(C) Distribution of different edgotype classes for disease mutations. 

(D and E) Differential LUMIER interaction scores among different edgotype classes, for binding to HSP90 (D) and HSC70 (E). p values by one-sided unpaired t test. 

(F) Differential expression among different edgotype classes (ELISA log2 ratio of mutant over WT). QW: n = 75, E: n = 49, QN: n = 42. p values by one-sided 
Wilcoxon rank sum test. 

(G) Distribution of different edgotype classes for non-disease variants. 

(H) Increased binding to HSP90, HSC70, or either (Union) for non-disease (N) or disease (D) variant proteins, p values by one-sided Fisher’s exact test. Error bars 
indicate SE of the proportion. *p < 0.05. 

See also Figures S4 and S5. 



disease mutations annotated by ClinVar (Landrum et al., 2014) 
and found the distribution of quasi-null, edgetic, and quasi-WT 
alleles was statistically indistinguishable from that of HGMD (Fig- 



ure S4G). We only identified two mutations that conferred PPI 
gains, suggesting that gain of interactions may be a rare event 
in human disease. 
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Protein Folding and Expression Levels of Edgetic 
Mutations 

Differences between edgotype classes could be due to protein 
folding and/or relative expression levels. Quasi-null proteins 
associated significantly more with cytoplasmic, but not ER, 
chaperones, whereas edgetic and quasi-WT proteins did not 
significantly change their chaperone association (Figures 4D- 
4E, and S5A-S5E). Quasi-null proteins appeared to be poorly 
expressed, while edgetic and quasi-WT proteins were ex- 
pressed at levels similar to those of their WT controls (Figure 4F). 
We validated several mutant-chaperone interactions and ex- 
pression profiles by co-immunoprecipitation with endogenous 
chaperones, followed by western blot (Figure S5F). All tested 
quasi-null proteins exhibited more binding to FISP90 and 
FISC70, although they were expressed at lower levels than their 
WT controls. Flowever, the edgetic TAT-P220S protein and the 
quasi-WT NCF2-R395W protein did not show any detectable 
chaperone association. Among mutant proteins with no change 
in chaperone binding, edgetic (28%) and quasi-WT (57%) 
proteins comprised the majority, while quasi-null proteins 
comprised a significantly lower percentage (15%) (Figure S5G). 
Altogether, these results suggest that quasi-null proteins are 
more often unstable/misfolded and diminished in their steady- 
state expression levels. In contrast, edgetic and quasi-WT pro- 
teins likely exhibit normal folding and expression levels, further 
supporting the idea that they may cause disease through inter- 
action perturbations or other mechanisms rather than simple 
loss of protein function. 

Disease-Causing Mutations Versus Common Variants 

Genome-wide association studies have identified hundreds of 
loci linked to particular disorders. However, these loci often 
contain several genes and multiple variants, making it chal- 
lenging to distinguish causal mutations from non-pathogenic 
variants. We observed previously that among binary interactions 
found by WT proteins, disease-causing alleles were more likely 
to perturb interactions than non-disease variants (Rolland 
et al., 2014). We further investigated both disease-causing al- 
leles from HGMD and common variants identified in healthy indi- 
viduals from diverse geographical sites (1000 Genomes Project 
Consortium, 2012) (Table SI A) with respect to the edgetic char- 
acter and chaperone binding of their protein products. Interac- 
tion profiling showed that only a small fraction of non-disease 
alleles lost interactions (8%, Figure 4G), a 7-fold reduction rela- 
tive to disease mutations (57%; p = 1.7 x 10“®; Figure 4C). In 
addition, non-disease alleles on average did not alter chaperone 
association (Table S2A), a characteristic distinct from disease 
mutations annotated by HGMD (Figure 4H) or ClinVar (Fig- 
ure S5H). Together, interaction perturbations can help distin- 
guish disease-associated alleles from non-disease alleles. 

To assess the predictive power of edgotyping to identify 
disease-causing mutations, we determined its precision and 
sensitivity in classifying an allele as causal based on interaction 
perturbation profiles. As a “gold standard” for causal alleles, we 
used a set of mutations annotated in HGMD as disease-causing 
(“DM” in Table SI A). As a negative control, we used a set of alleles 
most likely not associated with disease. We observed that 96% 
(105 of 109) of the alleles found to perturb interactions (E or QN) 



were disease-causing (Figure S6A). Conversely, 61% (105 of 
172) of disease-causing mutations annotated by HGMD were 
interaction-perturbing (Figure SOB). Together, our prediction 
achieved a precision (96%) and sensitivity (61%) significantly 
higher than random expectation. It is possible that current incom- 
pleteness of interaction network maps might limit the power of 
edgotyping to properly classify disease-causing mutations. To 
evaluate this possibility, we performed a down-sampling analysis 
and found negligible effect on mutation classification over a broad 
range of network sizes (Figure S6C). 

Edgetic Mutations and Interaction Interfaces 

To explore edgotypes from a structural point-of-view, we as- 
sessed the possible impact of distinct classes of mutations on 
protein function using PolyPhen-2 analysis (Adzhubei et al., 
2010). Interaction-perturbing mutations are significantly more 
often predicted to be deleterious than non-interaction-perturb- 
ing mutations (Figure 5A). We next investigated whether muta- 
tions from the different classes might differ in evolutionary 
conservation, based on the presumption that conservation of 
amino acid residues is a property that generally reflects function- 
ality (1000 Genomes Project Consortium, 2012; Subramanian 
and Kumar, 2006; Sunyaev, 2012). The residues affected by 
interaction-perturbing mutations are significantly more con- 
served across species compared to non-interaction-perturbing 
mutations (Figure SOD). However, PolyPhen and conservation 
analysis could not distinguish between edgetic and quasi-null 
mutations within the interaction-perturbing group. 

Given that structural domains often mediate protein interac- 
tions, different classes of mutation might vary in their locations 
relative to protein domains. Interaction-perturbing mutations 
are indeed significantly enriched within structural domains 
compared to non-interaction-perturbing alleles (Figure 5B and 
Table SIC). In addition to structural domains, intrinsically disor- 
dered regions and linear motifs could also play a role in medi- 
ating PPIs. However, we found interaction-perturbing disease 
alleles to be depleted in intrinsically disordered regions (Fig- 
ure S6E), and occurring in linear motifs as frequently as non-per- 
turbing alleles (Figure S6F). These results suggest that mutations 
perturbing PPIs are preferentially located within structural do- 
mains. Nevertheless, none of the above properties could reliably 
predict whether a mutation would give rise to an edgetic or 
quasi-null PPI effect. 

We next investigated whether edgetic and quasi-null muta- 
tions differ in their physical location within three-dimensional 
protein structures (Zhong et al., 2009). Edgetic mutations are 
significantly more enriched in structurally exposed residues 
compared to quasi-null mutations (Figure 5C). Consistently, 
edgetic mutations do not tend to cause a change in hydropho- 
bicity, a destabilizing feature that generally disrupts protein func- 
tion (Balasubramanian et al., 2005), while quasi-null mutations 
often lead to a decrease in hydrophobicity (Figure S6G). 

We also investigated whether or not edgetic mutations are 
more frequently located at an interface that supports interaction 
with a partner protein. Starting from all available structures of co- 
crystal complexes in the Protein Data Bank (PDB) involving a dis- 
ease gene product, we determined the relative location of each 
mutated residue within these structures (Extended Experimental 
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Procedures and Table S5A). In contrast to quasi-null mutations, 
edgetic mutations are significantly enriched at interaction inter- 
faces identified from the corresponding co-crystal structures 
(Figure 5D). Notably, edgetic mutations also exhibit a significant 
tendency to reside at interaction interfaces with the perturbed 
partners, as compared to unperturbed partners or random con- 
trols (Figure 5E). These results suggest that edgetic mutations 
are preferentially located at PPI interfaces, perturbing the corre- 
sponding interaction. 

Edgetic Mutations Perturb Interactions with Protein 
Partners Expressed in Disease-Relevant Tissues 

We hypothesized that protein interaction partners perturbed by 
edgetic mutations are likely to function together within the tissue 
known to be affected by the relevant disease. To test this, we 
compared gene expression patterns for perturbed and unper- 
turbed partners in disease-relevant tissues using RNA-seq 
data from the lllumina Human Body Map 2.0 project. Perturbed 
interactors exhibit a striking tendency to be expressed in dis- 



Figure 5. Edgetic Mutations Perturb Inter- 
action Interfaces with Protein Partners Ex- 
pressed in Disease-Relevant Tissue 

(A) PolyPhen-2 scores for mutations in different 
edgotype classes, p values by one-sided Wilcoxon 
rank sum test. 

(B) Percentage of mutations within Pfam domains, 
p values by one-sided position-shuffling test. 

(C) Percentage of mutations in exposed residues. 
QW: n = 83; E: n = 61 ; QN: n = 50. 

(D) Percentage of mutations at PPI interfaces. QW: 
n = 59; E: n = 32; QN:n = 16. 

(E) Percentage of interfacial mutations for per- 
turbed (n = 14) and unperturbed (n = 18) in- 
teractions, compared to random mutations. 

(F) Percentage of perturbed (n = 118) and unper- 
turbed (n = 85) interactors expressed in disease 
relevant tissues. Thirty random genes from RNA- 
Seq dataset are assessed for each disease gene, 
p values from (C) to (F) by one-sided Fisher’s exact 
test. Error bars (B) to (F), SE of the proportion. See 
also Figure S5. 



ease-relevant tissues compared with un- 
perturbed interactors or random genes 
(Figures 5F and S6H; Table S5B). These 
results indicate that disease mutations 
most often perturb interactions that are 
functionally relevant in the particular tis- 
sue(s) affected by a specific disease. 



Distinct Interaction Perturbations 
May Underlie Diverse Disease 
Phenotypes 

Our edgotyping model suggests that 
different mutations in the same gene 
may result in different, pleiotropic pheno- 
typic outcomes through perturbation of 
distinct interactions (Figure 6A). To test 
this, we compared mutation edgotype 
classes and the resulting disease phenotypes. Among pleio- 
tropic genes associated with two or more diseases, mutant al- 
leles associated with different disease manifestations were 
more likely to exhibit different edgotype classes of perturbed 
PPI profiles (Table S5C). 

This is exemplified by mutations in TPM3, which encodes 
slow muscle alpha-tropomyosin. Three TPM3 edgetic mutations 
L100M, R168G, and R245G are associated with fiber-type 
disproportion myopathy through an unknown mechanism (Adz- 
hubei et al., 2010; Clarke et al., 2008) (Figure 6B). These edgetic 
mutations perturb five of the ten interaction partners of the WT 
gene product. The majority of perturbed partners are expressed 
in muscle, the tissue most relevant to this disease (Figure 6C). 
One of the disrupted interactions is the interaction between 
TPMS and troponin, which was shown to be vital for the transduc- 
tion of calcium-induced signals required for muscle contraction 
(Gunning et al., 1990). Two other perturbed interactors, HSF2, 
involved in myotube regeneration (McArdle et al., 2006), and 
CCHCR1, required for cytoskeleton organization (Tervaniemi 
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et al., 2012), could also be of disease relevance. In contrast to 
these edgetic mutations, the quasi-WT mutation M9R causes a 
different disease, nemaline myopathy. M9R might affect actin 
binding, thus leading to the formation of abnormal nemaline 
rods (Laing et al., 1995). 

The possible disease relevance of our approach was further 
illustrated by edgetic mutations in the gene EFHC1 , mutations 
in which can cause epilepsy. One perturbed partner, ZBED1, 
plays a role in a major cell proliferation pathway affected by 
EFHC1 knockouts (Yamashita et al., 2007), while another per- 
turbed interactor, TCF4, is required for neuronal differentiation 
(Flora et al., 2007) (Figure 6D). 

We next reasoned that mutations perturbing a greater number 
of interactions would be likely to have a larger impact on protein 
function, and hence result in more severe phenotypic effects. We 
used the age of disease onset as a proxy for severity and deter- 
mined whether an increase in the fraction of interactions lost 
correlated with an increase in severity for each pair of mutations 
causing the same disease (as annotated by HGMD) (Figure 6E 
and Table S5D). We found that mutations perturbing more PPIs 
were associated with an earlier age of disease onset significantly 
more often than random expectation (Figure 6E). Although 
computational predictions based on PolyPhen-2 were able to 
distinguish between interaction-perturbing versus non-perturb- 
ing alleles (Figure 5A), they did not perform as well as our 
approach in predicting disease severity (Figure S6I). This limita- 
tion is consistent with the inability of PolyPhen-2 to distinguish 
between edgetic and quasi-null mutations (Figure 5A). 

Protein-DNA Interactions 

We hypothesized that mutations for which no PPI perturbation 
has yet been detected likely cause changes in other types of 
molecular interactions. As a proof-of-concept, we examined 
the effect of disease mutations on protein-DNA interactions 
(PDIs) between human transcription factors (TFs) (Reece-Hoyes 
et al., 201 1 a) and developmental enhancers (Fuxman Bass et al., 
2015). Our hmORFeomel .1 mutant library contains 70 TF ORFs 
altogether harboring 173 mutations (Table S6A). A primary 
screen using enhanced yeast-one hybrid (eYIFI) assays (Re- 
ece-Hoyes et al., 201 1 b) identified PDIs between 1 52 enhancers 
(Visel et al., 2007) and 28 WT TFs (Figure 1C and Extended 
Experimental Procedures). We then performed pairwise assays 
to compare the PDIs of mutant TFs and their WT counterparts 
in eYI FI assays (Table S6B). 

Using systematic PDI profiling, we determined edgotype clas- 
ses for 58 mutations in 22 TFs that bound at least two enhancers. 
We identified 38% of the mutations as quasi-null, 43% as 
edgetic (loss or gain of interaction), and 19% as quasi-WT (Fig- 
ure 7A). More than 80% ofTFmissense disease mutations tested 
either abrogated DNA binding or caused partial change of PDIs. 
Interestingly, almost half of the mutations are edgetic, chal- 
lenging the assumption that TF mutations that affect DNA bind- 
ing do so in a similar fashion across their targets. Among these, a 
significant fraction of mutations exhibit gain of PDIs, likely 
because these mutations cause a reduction in DNA-binding 
specificity and allow greater promiscuity in target recognition. 

Given that TFs interact with their DNA targets through DNA- 
binding domains (DBDs), we assessed whether disease muta- 



tions perturbing PDIs are enriched within DBDs. Mutations within 
versus outside DBDs exhibited strikingly different PDI perturba- 
tion patterns (p = 1 .1 x 10“^; Figure 7B and Table S6C). Among 
quasi-null mutations, the proportion of mutations within DBDs 
was ~1 0-fold higher than outside DBD regions. These results 
confirm that most PDI perturbing mutations reside within the 
DBDs of proteins, further supporting the quality and validity of 
our PDI perturbation data. 

Mutations within the same TF that cause different PDI changes 
would affect the expression of different targets, resulting in 
different diseases. We examined disease-causing TF mutations 
in pleiotropic genes associated with two or more diseases. 
Mutations with different PDI edgotype classes were likely to be 
associated with different clinical manifestations (Figure 7C), 
consistent with our results for PPI perturbations (Figure 6A). 

Of the disease mutations for which both PPI and PDI data were 
available, about half did not perturb any PPIs (Figure 7D). Inter- 
estingly, for ~80% of these we did identify PDI perturbations. 
For instance, mutations in the TGF-p-induced transcription fac- 
tor TGIF1 cause holoprosencephaly (Gripp et al., 2000). While 
the two mutant variants S28C and P63R are still able to bind their 
protein partners CTBP1 and CTBP2 (quasi-WT for PPI), both mu- 
tations completely abrogated the ability of TGIF1 to bind any of 
the tested DNA targets (quasi-null for PDI) (Figure S7A). Clearly, 
integrating different types of molecular interactions will enhance 
our ability to understand specific mechanisms that underlie 
many genetic disorders. 

To gain further insights into alternative molecular interaction 
perturbations, we computationally examined the effect of dis- 
ease mutations on protein-chemical interactions (Reva et al., 
2011). We found that the frequency with which disease muta- 
tions are at protein-chemical interfaces is significantly higher 
than that of non-disease variants (Figure S7B). In addition, 
disease mutations that perturb PPIs have no discernable ten- 
dency to locate at protein-chemical interfaces (Figure S7C), 
suggesting that protein-protein and protein-chemical interfaces 
do not tend to overlap. Interestingly, ~13% of PPI non-perturb- 
ing mutations are located at protein-chemical interfaces, 
supporting the conclusion that these mutations could cause 
disease through perturbation of alternative types of molecular 
interactions. 

We combined computational predictions and interaction 
profiling to optimize our performance in disease mutation strati- 
fication. Although computational methods such as PolyPhen-2 
could predict interaction-perturbing alleles as deleterious (Fig- 
ure 5A), they fail to explain many disease-causing mutations, 
and misclassify them as “benign” (Figure S7D). Among these 
misclassified mutations, ~50% could be explained by molecular 
interaction perturbations (PCI, PPI, or PDI). For instance, the 
S140F mutation in PKP2 encoding the adhesion protein plako- 
philin leads to arrhythmogenic right ventricular dysplasia (Gerull 
et al., 2004). While PolyPhen-2 predicts S140F as benign, the 
S140F mutant exhibited increased binding to the chaperones 
HSC70 and BAG2, and lost all the PPIs of the WT protein (Table 
S7A). All together, existing computational methods alone fail to 
precisely predict disease causality. Examining different types 
of molecular interaction perturbations is critical for a full compre- 
hension of disease-causing mutations in human. 
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Figure 6. Heterogeneous Genetic Mutations Give Rise to Diverse Disease Outcomes through Distinct Interaction Perturbations 

(A) Schematic of pleiotropic disease outcomes resulting from distinct interaction patterns (edgotypes) caused by distinct mutations. Percentage of mutation pairs 
causing different diseases out of all pairs with different or the same edgotype classes is shown, n = 52. Error bars, SE of the proportion, p values by one-sided 
Fisher’s exact test. 

(B) Example of edgotyping four disease mutations in the pleiotropic gene TPM3. 



(legend continued on next page) 
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Figure 7. Integration of Protein-Protein and Protein-DNA Interaction Perturbations 

(A) PDI edgotype distribution for disease mutations in 22 TFs that bind to more than one enhancer. 

(B) Histogram showing percentage of mutations within and outside DBDs as a function of the percentage of PDI loss. Numbers on x axis indicate bin range, p 
values by one-sided Wilcoxon rank sum test. 

(C) Percentage of TF mutation pairs that cause different diseases out of all pairs with different or the same PDI edgotype classes (n = 1 7). Error bars, SE of the 
proportion, p values by one-sided Fisher’s exact test. 

(D) PPI-PDI integration enables mutation characterization at higher resolution. Percentage of mutations is shown for: PPI and PDI unperturbed; PPI unperturbed 
and PDI perturbed; PPI perturbed and PDI unperturbed; and PPI and PDI perturbed in the integrated network. 

See also Figure S7. 



DISCUSSION 

In this systematic characterization of mutations across various 
human Mendelian disorders, we have found surprisingly wide- 



spread disease-specific perturbations of macromolecular inter- 
actions. Approximately 60% of disease-associated missense 
mutations perturb PPIs, among which half result in complete 
loss of interactions, generally caused by protein misfolding and 



(C) Most perturbed partners of TPMS are expressed in the disease-relevant tissue. 

(D) Edgetic mutations in EFHC1 perturb epilepsy-related protein partners. 

(E) Correlation between the fraction of PPI perturbation and age of onset for mutation pairs causing the same disease, p values by comparing the observed value 
to 100,000 random controls (n = 13; Extended Experimental Procedures). 

See also Figure S6. 
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impaired expression, and the other half lead to edgetic perturba- 
tions. Importantly, different mutations in the same gene fre- 
quently result in different interaction perturbation profiles. This 
strongly suggests that the “edgotype” of a mutation represents 
a fundamental link between genotype and phenotype. 

Our systematic edgotyping strategy provides a practical 
approach to classifying candidate disease alleles emerging 
from genome-wide association studies and from sporadic and 
somatic mutation sequencing approaches. Edgotyping achieves 
a high precision in identifying candidate disease-causing muta- 
tions based on the interaction perturbations relative to WT alleles 
(Figure S6A). However, the overall sensitivity of an edgotyping 
approach is compromised due to the false negative rate inherent 
to the assays used. We expect that a significant fraction of var- 
iants currently viewed as non-interaction-perturbing (quasi-WT) 
will eventually be proven to be edgetic and possibly cause 
disease. This circumstance likely arises from the incomplete 
nature of current human interactome network maps (Holland 
et al., 2014). Nevertheless, because edgetic mutations cannot 
become quasi-WT or quasi-null even as interactome maps 
improve, our estimate of edgetic mutations already provides a 
reliable minimum lower bound for their frequency. 

An alternative possibility is that quasi-WT mutations affect dis- 
ease phenotypes through perturbation of different types of mo- 
lecular interactions. Biological signaling is regulated at multiple 
levels, and various types of molecular interactions are involved 
(Sahni et al., 2013) as we have shown for PPI and PDI networks. 
In addition, protein-RNA (Lee et al., 2006) and protein-metabolite 
(Carpten et al., 2007) interactions have also been shown to be 
involved in disease. Perturbations of these alternative interaction 
networks will undoubtedly result in distinct disease conse- 
quences. One can envision that integration of additional types 
of interaction perturbation information with computational pre- 
dictions will be necessary for a complete understanding of the 
cellular networks governing a particular disease state (Fig- 
ure S7D). As a major benefit, perturbed interactions spotlight 
specific targets and pathways that are altered in a patient-spe- 
cific context. This type of information could provide a much- 
needed guide in efforts to developing better diagnostic tools 
and more personalized medical treatments. 

EXPERIMENTAL PROCEDURES 

Using ORFs in the human ORFeome v8.1 coiiection as tempiate, we PGR 
ampiified the two DNA fragments flanking the mutations, followed by a fusion 
PGR to stitch the fragments together. The resulting fusion ORFs harboring the 
mutations were Gateway cloned into the Donor vector pDONR223 to derive 
Entry clones (Rual et al., 2004), which were subsequently verified by next-gen- 
eration sequencing (Yang et al., 2011). 

Interaction with chaperones and other QGFs was performed using a quan- 
titative LUMIER assay (Taipale et al., 2012; Taipale et al., 2014). All wild-type 
and mutant allele clones were transferred via Gateway recombination into a 
mammalian expression vector containing a G-terminal 3xFLAG-V5 tag. Stable 
HEK293T cell lines expressing luciferase-QGF fusion proteins were generated 
by lentiviral infection, and plasmids carrying wild-type and disease mutation 
alleles were transfected into the stable HEK293T lines (Taipale et al., 2012). 
Following capture of FLAG-tagged proteins, luminescence was measured to 
determine QGF-target interaction. Following luminescence measurement, 
FLAG-tagged mutant and wild-type proteins were detected as described 
(Taipale et al., 2012). 



We performed a binary protein-protein interaction screen for all mutant and 
wild-type alleles as baits against -^7,200 human prey proteins (Rual et al., 
2004). The identified interactions were combined with the known pairs cata- 
loged by the human binary interaction dataset HI-ll-14 (Rolland et al., 2014). 
All first-pass pairs from the primary Y2H screens were subjected to pairwise 
testing in which all interactors of any allele of a gene were then tested against 
all alleles of that gene. The resulting verified protein-protein interaction profiles 
of disease mutants were compared with their wild-type counterparts. We vali- 
dated perturbed and unperturbed interactions from mutation-mediated 
interaction perturbation data (“edgotyping” data) using an orthogonal in vivo 
Gaussia princeps luciferase protein complementation assay (GPGA). Human 
HEK293T cells were co-transfected with each construct expressing comple- 
mentary fragments of the Gaussia luciferase fused in frame with the tested pro- 
tein pairs and luciferase activity was measured as described (Gassonnet et al., 
2011). 

An enhanced yeast one-hybrid (eYI H) assay was used to detect binary pro- 
tein-DNA interactions (PDIs) between a DNA bait and a protein prey (Reece- 
Hoyes et al., 2011a; Reece-Hoyes et al., 2011b). DNA baits corresponding 
to human enhancers were retrieved from the Vista Enhancer Browser (http:// 
enhancer.lbl.gov) (Visel et al., 2007). Protein preys were a set of TFs for which 
mutant clones are available in our human mutation ORFeome version 1.1. We 
performed pairwise eYIH assays of an arrayed collection of TF preys 
comprising all the wild-type TFs and their mutant clones against 1 52 available 
enhancer baits. 

Disease-causing mutations were annotated by HGMD, and the deleterious- 
ness of amino acid substitutions was predicted by PolyPhen-2 program (Adz- 
hubei et al., 2010). For structural features, distinct mutations were compared 
with respect to protein domains from the Pfam database, and interaction 
interfaces on co-crystal structures from PDB. Tissue-specific gene expression 
was analyzed with normalized RNA-seq data from Human Body Map 2.0 
(GSE30611). Network properties analyzed included betweenness centrality, 
k-core centrality, degree, and closeness centrality (de Nooy et al., 2005). 

Full details are provided in the Extended Experimental Procedures. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and seven tables and can be found with this article online at http://dx. 
doi.org/1 0.101 6/j.cell.201 5.04.01 3. 
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SUMMARY 

Gene regulatory networks (GRNs) comprising inter- 
actions between transcription factors (TFs) and reg- 
ulatory loci control development and physiology. 
Numerous disease-associated mutations have been 
identified, the vast majority residing in non-coding 
regions of the genome. As current GRN mapping 
methods test one TF at a time and require the use 
of cells harboring the mutation(s) of interest, they 
are not suitable to identify TFs that bind to wild- 
type and mutant loci. Here, we use gene-centered 
yeast one-hybrid (eY1 H) assays to interrogate bind- 
ing of 1 ,086 human TFs to 246 enhancers, as well 
as to 109 non-coding disease mutations. We detect 
both loss and gain of TF interactions with mutant 
loci that are concordant with target gene expression 
changes. This work establishes eYIH assays as a 
powerful addition to the toolkit of mapping human 
GRNs and for the high-throughput characterization 
of genomic variants that are rapidly being identified 
by genome-wide association studies. 

INTRODUCTION 

Gene regulatory networks (GRNs) comprising physical and 
functional interactions between transcription factors (TFs) and 
regulatory elements play a critical role in development and phys- 
iology (Davidson et al., 2002; Walhout, 2006). Consequently, 
inappropriate gene regulation underlies a variety of human dis- 
eases. A broad variety of disease-associated mutations have 
been uncovered, including mutations in TF-encoding genes as 
well as mutations in non-coding sequences such as enhancers 
and promoters. Importantly, ~90% of disease-associated vari- 
ants identified by genome-wide association studies (GWAS) 
reside in the non-coding part of the genome (Hindorff et al., 
2009; Maurano et al., 201 2), and a main challenge is to determine 
the interactions with TFs that may be perturbed as a conse- 
quence of such mutations. 

TF-DNA interactions can be mapped with either “TF- 
centered” (protein-to-DNA) or “gene-centered” (DNA-to-pro- 
tein) methods (Figure 1A) (Arda and Walhout, 2009; Deplancke 

CrossMark 



et al., 2006). Chromatin immunoprecipitation (ChIP) is the most 
widely used TF-centered method to identify the DNA regions 
with which a TF interacts in vivo. The last decade has seen an 
explosion of ChIP data. While progress has been impressive, 
several challenges remain. First, even for large consortia such 
as ENCODE, ChIP data have been generated for only ~150 of 
the ~1,500 human TFs (Gerstein et al., 2012). This is because 
Chip critically depends on suitable anti-TF antibodies, which 
are only available for a minority of human TFs. Second, each 
TF has been assayed only in a limited number of cell lines and 
conditions. Third, ChIP may work better for some TFs than for 
others. For instance, TFs with restricted expression patterns 
and/or expressed at low levels may be less amenable to ChIP 
compared to highly and broadly expressed TFs. Fourth, ChIP 
is not optimal for characterizing disease-associated mutations 
in large-scale, high-throughput settings, because it requires dis- 
ease cells or tissues that harbor the relevant mutation, which 
may be difficult to obtain. Finally, ChIP cannot be used to identify 
TFs with altered binding to mutant regulatory regions ab initio 
because the method is TF-centered and as a result one needs 
to first identify candidate TFs and then test these one at a time. 

Enhanced yeast one-hybrid (eYIH) assays provide a gene- 
centered method for the detection and identification of TF- 
DNA interactions (Reece-Hoyes et al., 2011b, 2013; Arda et al., 
2010; Brady et al., 2011; Fuxman Bass et al., 2014; Martinez 
et al., 2008). Briefly, eYI H assays measure TF-DNA interactions 
in the milieu of the yeast nucleus. DNA regions to be assayed 
(DNA baits) are fused upstream of two reporter genes, LacZ 
and HIS3, and integrated into the yeast genome, enabling their 
incorporation into chromatin. TFs (preys) are introduced into 
the DNA bait strains by mating using a robotic platform and 
are tested in quadruplicate, providing an inherent interaction 
retest (Figure 1 B). 

Here, we test our human eYIH platform (Reece-Hoyes et al., 
201 la) to identify TFs interacting with human enhancers and to 
determine protein-DNA interaction changes caused by mutant 
TFs as well as non-coding disease-associated mutations. We 
find that eYIH assays more effectively retrieve TFs with limited 
expression patterns or levels when compared to ChIP. We 
provide examples of functional models of target sharing by 
TFs, including redundancy, which may provide robustness and 
opposing function (activation versus repression), which can 
ascertain proper timing of enhancer activity during development. 
Finally, we demonstrate that eYIH assays can be effectively 
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Figure 1. Gene-Centered Yeast One-Hybrid Assays 

(A) Gene-centered versus TF-centered approaches for mapping protein-DNA interactions. Rectangles, regulatory regions; ellipses, TFs. 

(B) Cartoon of eY1 H assays. A DNA sequence of interest is cloned upstream of two reporter genes {HIS3 and LacZ) and integrated into the yeast genome (i.e., 
each DNA bait is tested in duplicate by activation of each reporter in the same yeast nucleus). The resulting yeast DNA bait strain is mated to a collection of yeast 
strains harboring TFs fused to the Gal4 activation domain (AD). Positive interactions are determined by the ability of the diploid yeast to grow in the absence of 
histidine and overcome the addition of SAT a competitive inhibitor of the HIS3 enzyme and turn blue in the presence of X-gal. Each TF is tested in quadruplicate. 
Red boxes show positive interactions. 



used to identify changes in TF binding conferred by disease- 
associated coding or non-coding mutations. 

RESULTS 

A Gene-Centered Human TF-Enhancer Interaction 
Network 

We first focused on a set of human developmental enhancers 
that were previously tested for embryonic activity in mouse 
transgenic assays at day E11.5 (Table S1) (Visel et al., 2007). 
We expanded the human eY1 H platform to 1 ,086 full-length hu- 
man TFs (76% of all 1 ,434) and examined interactions for 360 
enhancers. Thus, in total we tested 390,960 putative TF-DNA in- 
teractions. To ensure the technical quality of the data, we only 
considered interactions in which both eY1FI reporters and at 
least two of the four colonies of a TF quadrant tested positively. 
For the majority of cases, all four colonies scored positively. Lim- 
itations and advantages of eY1 FI data are discussed extensively 
below. The resulting TF-enhancer interaction network contains 
2,230 interactions between 246 enhancers and 283 TFs (Fig- 
ure 2A; Table S2). 

We ascertained the quality of the network using several 
different metrics. First, we observed a statistically significant 
overlap between eY1 H interactions and the presence of TF bind- 
ing sites, which indicates that most interactions are likely direct 
(Figure 2B, p < 0.0001). Second, we found a statistically signifi- 
cant overlap between eYIH and ENCODE ChIP interactions 
(Gerstein et al., 2012) (Figure 2C, p < 0.0001). Third, TFs that 
interact with developmental enhancers are enriched for those 
expressed early in development, which is consistent with the 
timing of enhancer activity (Figure 2D). Finally, the network is en- 
riched for homeodomain TFs, well-known regulators of develop- 
mental gene expression (Chi, 2005) (Figure SI). This enrichment 



is specific for the developmental enhancers because we did not 
observe it with the eYIH data set related to disease mutations 
that is discussed below (Figures 2E and SI). The network is 
depleted for interactions involving ZF-C2H2 TFs (Figures 2F 
and SI), and consistently, these TFs are overall expressed at 
later stages in development. Importantly, however, ZF-C2H2 
TFs that do interact with the developmental enhancers are ex- 
pressed at earlier stages than those that do not (Figure 2D). 

To further assess the quality of the network, we reasoned that 
if a TF truly binds an enhancer in vivo, the TF would be expressed 
at the same time and place where the enhancer is active. Indeed, 
we found a modest but significant overlap between enhancer 
activity and spatiotemporal TF expression (Figure 2G). We 
wondered if the enrichment was relatively modest because the 
TFs identified in eYIH assays are a collection of transcriptional 
activators and repressors. We observed a more striking enrich- 
ment for known transcriptional activators, while repressors are 
not enriched (Figure 2G; Table S3). Altogether, these findings 
provide general support for the quality of the human TF- 
enhancer interaction network. 

eYIH Assays Provide a Powerful Addition to the GRN 
Mapping Toolkit 

Several of our findings demonstrate that eYI H assays are com- 
plementary to other TF-DNA interaction mapping methods. For 
instance, we found that the fraction of eYI H interactions also de- 
tected by Chip is larger for TFs that have been assayed by ChIP in 
multiple cell lines (Figure 2H). This underscores that ChIP in a 
given tissue/cell line only uncovers a subset of interactions in 
which the relevant TF engages, while eYIH assays interrogate 
the available repertoire of TFs for a given enhancer in a single 
experiment. Further, TFs that interact with developmental en- 
hancers in eYIH assays exhibit more tissue-specific expression 
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compared to all TFs tested or toTFs assayed by ChIP (Figure 21). In 
addition, TFs assayed by ChIP are expressed at higher levels than 
those detected in eY1FI assays (Figure 2J). These observations 
indicate that each method has particular strengths of detecting 
certain types of TFs. Indeed, we detected interactions for 82 
TFs that had not been detected by any other high-throughput 
method (Figure 2K) (Badis et al., 2009; Jolma et al., 2013). 

The TF-Enhancer Network Reveals Functional 
Relationships between TFs 

In eYI H assays, DNA baits often interact with multiple members 
of the same TF family (Reece-Hoyes et al., 2013). This is likely 
because such TFs have similar DNA binding domains and recog- 
nize similar DNA sequences (Badis et al., 2009; Grove et al., 
2009; Weirauch et al., 2014). To visualize enhancer sharing by 
TFs, we calculated the target profile similarity for each pair of 
TFs: i.e., the number of overlapping enhancer targets relative 
to the number of targets that interact with either TF (Fuxman 
Bass et al., 2013). We delineated a TF association network in 
which TFs with target profile similarity >0.2 are connected (Fig- 
ures 3A and S2). As expected, TFs generally cluster by family. 
Further, there is a significant correlation between DNA binding 
domain identity, DNA motif similarity, and target profile similarity 
(Figures 3B, 3C, and S3). However, similar to our observations in 
C. elegans, there are many examples of TF pairs with high DNA 
binding motif similarity but low target profile similarity (Figure 3C) 
(Reece-Hoyes et al., 2013). 

The sharing of enhancers by paralogous TFs begs the ques- 
tion of whether only one of these actually interacts with that 
DNA fragment in vivo, or if there could be a biological explana- 
tion for enhancer sharing between TFs. Conceptually, there are 
several possibilities. First, two TFs may share enhancers in the 
same tissue at the same time to provide redundancy that can 
lead to robustness of enhancer function when one TF is geneti- 



cally or environmentally perturbed (Macneil and Walhout, 
2011). Second, TFs may bind the same enhancer, but in different 
tissues, or at different developmental times. Finally, TFs that 
share enhancers could have opposing regulatory effects where 
one activates and the other represses transcription, for instance 
at different developmental stages. 

There are several examples of redundancy between TF paral- 
ogs. For instance, three ETS TFs share targets in human T cells 
and function redundantly (Hollenhorst et al., 2007). If redundancy 
is prevalent in human GRNs, one would expect that TFs that 
share targets would also tend to be co-expressed. Indeed, TFs 
that bind to highly overlapping sets of enhancers are generally 
more co-expressed than TFs that bind different enhancers (Fig- 
ure 3D). An example is a group of six redundant Abdominal-B 
(Abd-B) HOX TFs (Maconochie et al., 1996) that bind a highly 
overlapping set of enhancers in eYIH assays (Figure 3A, blue 
box). These TFs are also highly co-expressed and neither of 
them is essential for viability, although overall 60% of TFs in 
the network confer lethality when knocked out in mouse. Impor- 
tantly, TF pairs with both high target profile similarity and high co- 
expression similarity are overall enriched for pairs in which both 
TFs are non-essential (Figure 3E). Altogether, these results sug- 
gest a potentially widespread redundancy between TFs. 

TFs that share a large proportion of targets could have 
opposing functions if one is an activator and the other is a 
repressor. For instance, the activator LHX4 and two repressors, 
LHX6 and HESX1 , share a large proportion of enhancers (Fig- 
ure 3F). HESX1 and LHX6 can both repress activation by LHX4 
in transient transfection assays (Figures 3G and 3H). LHX4 is ex- 
pressed after HESX1 in the developing pituitary and before LHX6 
in the developing CNS (Figure 3F), suggesting that HESX1 may 
prevent precocious activation by LHX4 in the developing pitui- 
tary, while LHX6 repression prevents prolonged activation by 
LHX4 in the developing CNS. Thus, the network can identify TF 



Figure 2. A Human Gene-Centered TF-Enhancer Interaction Network 

(A) The TF-enhancer interaction network comprises 2,230 interactions between 246 human developmental enhancers and 283 TFs. Enhancers that are active in a 
single tissue at day E1 1 .5 (top nodes) or multiple tissues (bottom nodes) are connected to the TFs (middle yellow nodes) with which they interact. 

(B and C) eY1H interactions significantly overlap with the occurrence of known TF binding sites (B) and ChIP peaks (C). The Venn diagrams (top) illustrate the 
number of overlapping interactions. The eY1H network was randomized 20,000 times by edge switching (Martinez et al., 2008) and the overlap in each ran- 
domized network was calculated (bottom panel). The numbers under the histogram peaks indicate the average overlap in the randomized networks. The red 
arrows indicate the observed overlap in the real network. 

(D) Timing of expression during mouse development for homeodomain (HD) and ZF-C2H2 families. The fraction of TFs whose expression was detected at a 
particular Theiler Stage during development is shown. *p < 0.01 versus eYIH-negative TFs by Fisher’s exact test. 

(E and F) Percentage of TFs or interactions involving homeodomains (E) or ZF-C2H2 TFs (F) for two data sets. Statistical significance determined by proportion 
comparison test. 

(G) Overlap between enhancer activity and TF expression pattern. The fraction of TF-enhancer pairs that overlap in expression was compared between inter- 
acting and non-interacting pairs. The same analysis was performed for known activators and repressors. Statistical significance was determined using Fisher’s 
exact test. 

(H) The fraction of eYI H interactions that were also detected by ChIP were partitioned based on the number of cell lines in which a particular TF was tested by 
Chip, p = 0.041 by Mann-Whitney’s U test. Error bars represent SEM. 

(I) Tissue specificity score for TFs detected by eYI H (n = 266), ChIP (n = 96) or all TFs present in the eYI H array (n = 896), based on their expression levels across 
34 tissues (Ravasi et al., 201 0). This score quantifies the departure of the observed TF expression pattern from the null distribution of uniform expression across all 
tissues, using relative entropy. Each box spans from the first to the third quartile, the horizontal lines inside the boxes indicate the median value and the whiskers 
indicate minimum and maximum values. Statistical significance determined by Mann-Whitney’s U tests. 

(J) The maximum expression level across 34 tissues were obtained from Ravasi et al. (2010) for each TF detected by eYI H (n = 266), ChIP (n = 96), or all TFs 
present in the eYI H array (n = 896) are plotted. Each box spans from the first to the third quartile, the horizontal lines inside the boxes indicate the median value 
and the whiskers indicate minimum and maximum values. Statistical significance determined by Mann-Whitney’s U tests. 

(K) Venn diagram depicting the overlap between TFs detected by eYIH and those detected by high-throughput SELEX (HT-SELEX), ChIP-seq, and protein 
binding microarrays (PMBs). 

See also Figure SI and Tables SI , S2, and S3. 
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pairs that bind to similar targets but that may have opposing 
functions in target gene regulation. This may be crucial to tightly 
control the expression of particular programs during develop- 
ment. Altogether, these data indicate that multiple TFs from 
the same family found to interact with overlapping sets of en- 
hancers in eY1 H assays may be relevant in vivo and can provide 
different gene regulatory functionality. 

Human Disease and TF Network Connectivity 

Mutations in TF coding sequences can cause a variety of dis- 
eases that could impact GRNs in different ways. Some muta- 
tions may abrogate DNA binding completely while others affect 
binding to only a subset of targets. We hypothesized that muta- 
tions in TFs that bind a large set of targets are more likely to affect 
an important biological function. It has been shown previously 
that TFs that bind to many promoters in C. elegans are more 
frequently essential for viability than TFs that only bind a few pro- 
moters (Deplancke et al., 2006). Similarly, protein-protein inter- 
action hubs are more frequently essential (Goh et al., 2007). 
Interestingly, a combined protein-protein and protein-DNA inter- 
action degree was more strongly associated with phenotypic 
output for human TFs than either degree alone. Essential and 
disease-associated TFs have a higher combined degree than 
non-essential TFs (Figures 4A and 4B). In addition, there is a sig- 
nificant correlation between combined TF degree and the den- 
sity of somatic mutations in cancer (Figure 4C). This is specific 
to somatic mutations as no correlation was observed between 
TF degree and the density of protein altering variants in healthy 
individuals from the 1000 Genomes Project (Abecasis et al., 
2010) (Figure 4D). In sum, mutations in highly connected TFs 
more frequently affect phenotypic outcomes leading to disease. 

Disease-Associated TF Coding Mutations 

Both coding (in TFs) and non-coding (in regulatory DNA ele- 
ments) mutations can cause human disease, likely by changing 



target gene expression in trans or c/s, respectively. Such dis- 
ease-associated mutations can potentially affect GRNs by: (1) 
complete loss of all TF-DNA interactions, (2) loss of a subset of 
interactions, (3) gain of interactions, or (4) a combination of inter- 
action loss and gain. However, because no suitable methods 
were available to discriminate between these possibilities it 
remains unclear, in the vast majority of cases, which TF-DNA in- 
teractions are lost or gained as a result of specific mutations. 

We hypothesized that eYI H assays would be highly suitable to 
interrogate differential TF binding caused by disease-associated 
mutations because: (1) mutations can readily be introduced in 
DNA baits or TF preys by molecular cloning, circumventing the 
need for patient samples harboring mutant TFs or mutant 
regulatory sequences; (2) eYIH assays enable direct, unbiased 
comparisons between wild-type and mutant TFs, and (3) eYIH 
assays test all available TFs in parallel in one experiment, 
enabling the direct determination of differential TF binding to 
mutant regulatory DNA sequences. 

To determine the impact of TF coding mutations on enhancer 
binding, we first focused on four mutant LHX4 TFs that confer 
pituitary hormone deficiency (Pfaeffle et al., 2008; Tajima et al., 
2007) (Figure 5A). The P389T mutation is located outside the 
DNA binding domain and this mutant retains most (1 8 of 1 9) pro- 
tein-DNA interactions (Table S4). However, two mutations in the 
homeodomain (L190R and A210P) result in complete loss of in- 
teractions, which is consistent with previous in vitro experiments 
(Pfaeffle et al., 2008) (Figures 5A and 5B; Table S4). Interestingly, 
we detected partial loss and gain of weak interactions caused by 
the R84C mutation, which is located in the LIM domain and is 
known to modulate DNA binding (Pfaeffle et al., 2008). These 
results were further confirmed with luciferase assays in which 
we found changes in the transcriptional activation capacity 
that correlate with changes in DNA binding (Figure 5C). 

We also evaluated two missense mutations in the homeo- 
domain of HESX1: the R160C mutation leads to septo-optic 



Figure 3. TF Redundancy and Opposing Functions 

(A) TF association network. Each node represents a TF and edges connect TFs with a target profiie simiiarity >0.2 (ieft, aii TF famiiies) or >0.45 (right, 
homeodomains). TFs with degree >3 in the eY1H network are shown. Node coior indicates TF famiiies. The biue square highiights a set of HOX Abd-B TFs 
discussed in the main text. AP2, activating protein 2; bZiP, Basic Leucine Zipper Domain; bHLH, basic heiix-ioop-heiix; HD, homeodomain; HMG, High-Mobiiity 
Group; MH1, Mad homoiogy 1; WH, Winged Heiix; ZF-C2H2, Zinc Finger C2H2; ZF-DHHC, Zinc Finger DHHC; ZF-NHR, Nuciear Hormone Receptor. 

(B) Target profiie simiiarity between TFs according to DNA binding domain identity. For each pair of TF paraiogs with different DNA binding domain amino acid 
identity their target profiie simiiarity was determined. Each box spans from the first to the third quartiie, the horizontai iines inside the boxes indicate median vaiue 
and the whiskers indicate minimum and maximum vaiues. Aii pairwise comparisons between groups are significant (p < 0.01) by Dunn’s muitipie comparison test. 

(C) Correiation between motif simiiarity and target profiie simiiarity. For each TF pair, target profiie simiiarity was piotted against their DNA motif simiiarity 
determined as the Pearson correiation coefficient of the Z scores obtained for aii possibie 8-mers in protein binding microarrays. 

(D) Histogram of spatiotemporai co-expression for TF pairs according to their target profiie simiiarity. Statisticai significance determined by Mann-Whitney’s 
U tests. 

(E) Redundancy between TFs. Each pair of TF paraiogs was binned according to their target profiie simiiarity and according to their spatiotemporai co- 
expression. The percentage of TF-pairs for which both TF knockouts are viabie was determined. Statisticai significance was determined using the proportion 
comparison test. 

(F) Top: overiap between enhancers bound by LHX4, LHX6 and HESX1. Bottom: cartoon of deveiopmentai expression. Red, transcriptionai activator; green, 
transcriptionai repressor. 

(G) HESX1 represses LHX4-induced enhancer activity. HEK293T ceiis were co-transfected with enhancer constructs cioned upstream of a Firefly luciferase 
reporter vector and the indicated TF expression vectors. After 48 hr, cells were harvested and luciferase assays were performed. Relative luminescence activity is 
plotted as fold change compared to cells co-transfected with control vector expressing GFP. Experiments were performed three times in three to six replicates. 
Average relative luminescence activity ± SEM is plotted. *p < 0.05 by Student’s t test. 

(H) LHX6 represses LHX4-induced enhancer activity. Experiments were performed three times in three to six replicates. Average relative luminescence activity ± 
SEM is plotted. *p < 0.05 by Student’s t test. 

See also Figures S2 and S3. 
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dysplasia (Dattani et al., 1998), whereas the N125S variant is a 
natural polymorphism in the Afro-Caribbean population (Brick- 
man et al., 2001) that is not associated with disease. Interest- 
ingly, the R160C mutation completely abolishes all interactions, 
while the N1 25S variant has a wild-type target profile (Figures 5D 
and 5E; Table S4). Wild-type and the N125S variant of HESX1 
repressed reporter gene expression in transient transfection 
assays while the R160C mutant did not, further confirming our 
findings (Figure 5F). Altogether, these data show that eY1H as- 
says can be effectively used to determine the consequences of 
TF-coding mutations on DNA target binding. 

Non-Coding Mutations Associated with Human Disease 

To determine the effect of non-coding mutations on TF binding, 
we selected 227 disease-associated mutations, affecting the 
expression of 137 genes. We identified interacting TFs for both 
wild-type and mutant clones of each regulatory element with 



Figure 4. Relationship between TF Connec- 
tivity and Human Disease 

(A) Cumulative distribution of TF protein-DNA 
interaction (PDI), protein-protein interaction (PPI), 
and combined degrees for essential and non- 
essential TFs. Combined TF degree is defined as 
the product of PPI and PDI degrees and repre- 
sents the number of paths connecting the protein 
interactors of a TF with its DNA targets. Statistical 
significance determined by Mann-Whitney’s 
U tests. 

(B) Cumulative distribution of TF degrees for TFs 
reported as disease-associated genes in the Hu- 
man Gene Mutation Database (HGMD) and genes 
not reported in HGMD. Statistical significance 
determined by Mann-Whitney’s U tests. 

(C) Correlation between TF degree and the 
number of protein-altering SNPs and short indel 
variants per 100 amino acids in cancer samples 
obtained from the Catalogue of Somatic mutations 
in Cancer (COSMIC). Statistical significance was 
determined using Pearson correlation coefficient. 

(D) Correlation between TF degree and the num- 
ber of protein-altering SNPs and short indel vari- 
ants per 100 amino acids in the 1000 genomes 
project. Statistical significance was determined 
using Pearson correlation coefficient. 



eY1FI assays and detected differential 
TF binding for 109 mutations (75 genes) 
associated with a variety of diseases (Fig- 
ures 6A and 6B; Table S5). Literature 
searches indicate that 66 of these muta- 
tions result in an increase while 39 confer 
a decrease in expression of the associ- 
ated target gene (for four mutations the 
effect on gene expression is not known; 
Table S5). The majority of mutations re- 
sulted in interaction loss (64 of 109, or 
59%). Remarkably, however, 32 muta- 
tions resulted in gain of interactions 
(29%) and 13 caused both interaction 
loss and gain (12%) (Figure 6C). Thus, gain of TF interactions 
may be a pervasive disease-causing mechanism. Overall, these 
mutations affect interactions with 1 1 1 TFs from all major families 
(Table S5). Strikingly, TFs involved in differential interactions are 
more frequently essential for viability and/or annotated as dis- 
ease-associated in HGMD compared to TFs that are not involved 
in differential interactions (Figure 6D). 

To validate the differential eY1H interactions, we first 
compared them to published differential interactions. Out of 
227 mutations tested, 54 had reported differential interac- 
tions that were experimentally supported by reporter assays, 
in vitro binding assays and/or by ChIP. For 34 of these the re- 
ported TF was either absent from our collection (19 mutations) 
or was never detected in eY1H assays (15 mutations). We de- 
tected the reported differential interaction for four of the remain- 
ing 20 mutations and differential interaction with a close TF 
paralog for three additional mutations (Table S6). Importantly, 
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Figure 5. Disease-Associated Coding Mutations in TFs 

(A) Four missense mutations in LHX4 were tested for loss or gain of protein-DNA interactions in eYIH assays against 152 enhancers. The top panel depicts a 
cartoon of LHX4, including the location of the mutations and the homeodomain (HD) and LIM domains. The bottom panel shows the number of interactions 
retained (black bar), lost (red bar) or gained (blue bar) for each mutant compared to wild-type interactions. 

(B) Examples of interactions lost and gained for LHX4 missense mutations. Each TF-enhancer combination was tested in quadruplicate three times. One random 
quadruplicate test is shown corresponding to four enhancers. Red squares, interaction lost with TF mutant; blue square, gained interaction with TF mutant; AD 
vector, empty prey vector. 

(C) Transcriptional activation mediated by wild-type and mutant LHX4 alleles. HEK293T cells were co-transfected with enhancer constructs cloned upstream of a 
Firefly luciferase reporter vector and the indicated TF expression vectors. Relative luminescence activity is plotted as fold change compared to cells co- 
transfected with empty expression vector. Experiments were performed four times with three replicates each. Average relative luminescence activity ± SEM is 
plotted. *p < 0.05 versus empty expression vector by Student’s t test. 

(D) Two missense mutations in HESX1 were tested for changes in protein-DNA interactions as in (A). 

(E) Examples of interactions lost for HESX1 missense mutations. 

(F) Repression of LHX4-induced enhancer activity by wild-type and mutant HESX1 alleles. HEK293T cells were co-transfected with enhancer constructs cloned 
upstream of a Firefly luciferase reporter vector and the indicated TF expression vectors. Relative luminescence activity is plotted as fold change compared to cells 
co-transfected with control vector expressing GFP. Experiments were performed six times with three replicates each. Average relative luminescence activity ± 
SEM is plotted. *p < 0.05 by Student’s t test. 

See also Table S4. 



the TFs detected by eY1H assays were not tested in the latter 
three studies, even though they have similar DNA binding 
specificity and are expressed in the relevant disease tissue 
(Table S5). Hence, it could be that either or both TF(s) contribute 
to the disease in vivo. 

Next, we devised a “supporting evidence score” (see 
Extended Experimental Procedures) for each interaction 
involving a mutant regulatory element, in which we weighted 
the interactions according to: (1) co-expression of the differen- 
tially interacting TF and the target gene in disease-relevant tis- 
sues, (2) if the differentially bound TF is associated with a similar 



disease or mouse phenotype as the target gene mutation, and (3) 
if the target gene expression change caused by the mutation (in- 
crease or decrease) was concordant with gain/loss of a protein- 
DNA interaction with an activator/repressor (Figure 6E). It is of 
course important to note that the data used in this integration 
are not yet complete and have their own confidence issues. 
Out of the 294 differential interactions (with 109 non-coding mu- 
tants), 98 have a medium/high to high level of confidence (Table 
S5). Importantly, the differential interactions involving TFs ex- 
pressed in disease-relevant tissues and/or associated with a 
similar disease are generally consistent with changes in target 
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Figure 6. Disease-Associated Non-Coding Mutations 

(A) Number of mutations per gene for which differential TF interactions were detected by eYI H assays. 

(B) Distribution of diseases associated with tested non-coding mutations. 

(C) Distribution of mutations that result in loss of interactions, gain of interactions, or both. 

(D) Fraction of essential (per MGI) or disease-associated TFs (per HGMD) differentially interacting with non-coding mutations (differential TFs) and the remaining 
TFs in the eYIH human TF collection (non-differential TFs). Statistical significance determined by proportion comparison test. 

(E) Cartoon depicting data integration used to obtain a supporting evidence score for differential eYIH interactions (see Extended Experimental Procedures). 
(F and G) Percentage of differential TF-target gene pairs in which the TF is co-expressed with the target gene in the disease tissue (F) or is associated with a similar 
disease or mouse phenotype (G) for interaction changes concordant or discordant with target gene expression changes. Statistical significance determined by 
proportion comparison test. 



(legend continued on next page) 
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gene expression: an increase in expression is concordant 
with gain of interaction with an activator or loss of an interaction 
with a repressor, while a decrease in expression correlates with 
gain of interaction with a repressor or loss of interaction with an 
activator (Figures 6F-6H). All differential interactions, gene 
expression changes, as well as expression and disease informa- 
tion are provided in Table S5. 

Several mutations cause differential interactions with multiple 
TFs, often from the same family. Two examples illustrate how 
such interactions can be evaluated for in vivo relevance. The first 
example involves a C to T mutation in the beta globin gene 
promoter that results in reduced gene expression leading to thal- 
assemia. This mutation results in loss of interactions with five pa- 
ralogous TFs: KLF2, KLF4, KLF7, KLF12, and KLF17, that bind 
similar DNA sequences (Figure 6I). Two of these paralogs, 
KLF2 and KLF4, are more likely involved than the other three 
TFs, because they are expressed in erythroid cells and have 
been shown to activate beta globin gene expression (Alhashem 
et al., 201 1 ; Gardiner et al., 2007). The second example involves 
a T to C mutation in the CYBB promoter that causes a reduction 
in expression leading to chronic granulomatous disease. eYIH 
assays identified loss of binding for IRF2 and IRF5, both of which 
are expressed in disease-relevant cells. Again, the mutation oc- 
curs in the binding site of these TFs (Figure 6J). IRF2 has been 
shown to activate the CYBB promoter (Luo and Skalnik, 1996) 
and, therefore, it is likely that loss of this interaction is most rele- 
vant to the disease. However, IRF5 cannot be entirely excluded 
because these two TFs may share targets in vivo as discussed 
above for the developmental enhancers. 

Dominant Mutations in the Sonic Hedgehog ZRS 
Enhancer 

We identified differential interactions for nine dominant muta- 
tions in the ZRS enhancer of SHH that result in ectopic gene 
expression along the anterior margin of the limb bud, causing 
digit malformations and polydactyly (Sharpe et al., 1999) (Fig- 
ure 7A; Table S5). Interestingly, we found both loss and gain of 
interactions with these mutations, involving many TFs that are 
expressed in the developing limb. Data integration showed that 
gain of interactions involving limb-expressed TFs mostly in- 
volves activators, while loss of interactions occurred more 
frequently with transcriptional repressors (p = 0.018, Figure 7B), 
both of which are concordant with the increased gene expres- 
sion elicited by these dominant mutations. Thus, similar diseases 
can result from gain or loss of TF interactions caused by different 
mutations within an enhancer. 

We characterized the 105C ^ G mutation in more detail. This 
mutation results in gain of interactions with three AP2 TFs, two of 
which are expressed in the limb and could be responsible for the 
gain of SHH expression (Figures 7A and 7C). Indeed, this muta- 
tion creates a consensus AP2 binding site (Badis et al., 2009) 



(Figure 7D). TFAP2B is a transcriptional activator and activates 
the mutant, but not wild-type enhancer in luciferase assays (Fig- 
ure 7E). Together, these results show that TFAP2B can bind and 
activate 105C ^ G enhancer mutant, suggesting that aberrant 
binding of TFAP2B may result in the ectopic expression of 
SHH, thereby causing digit malformations. 

DISCUSSION 

This study presents a gene-centered human TF-enhancer 
interaction network delineated by eYIH assays. The technical 
quality of this network is ensured by the inherent retest of inter- 
actions with two reporter genes and the testing of TFs in 
quadruplicate, as well as due to the high demonstrated rate of 
reproducibility between independent experiments (~90%) (Re- 
ece-Hoyes et al., 2011b, 2013). The biological quality of this 
network is also high, as indicated by several metrics, including 
significant overlap with TF binding sites, ChIP interactions, TF 
expression and enhancer activity, enrichment for homeodo- 
mains, and reporter assays. The relatively modest overlap 
with Chip data reflects the notion that ChIP may retrieve indirect 
TF interactions, as well as a lack of sensitivity of ChIP data that 
were only obtained in one or two cell types. Like any other 
method, however, eYIH assays may also yield false positive 
and negative interactions with both enhancers and disease- 
causing mutant elements. False positive interactions may be 
retrieved when multiple members of the same TF family with 
highly similar consensus binding sites are found to bind to the 
same enhancer(s) and only a subset of these actually bind the 
enhancer in vivo. Importantly, however, we illustrate several 
mechanisms by which enhancer sharing can be biologically 
meaningful in attaining redundancy or in the precise timing of 
enhancer activity, for instance during development. The careful 
integration of eYIH interactions with high-resolution spatiotem- 
poral expression and other types of data over time will 
provide protein-DNA interaction data of increasing validity and 
resolution. 

The rate of false negatives in eYIH assays is likely to be 
considerable (Walhout, 2011). For instance, TFs that exclusively 
interact with DNA as heterodimers or after post-translational 
modification by another human protein will not be detected. 
In addition, eYI H assays cannot as of yet detect cooperative in- 
teractions with multiple TFs. Therefore, the retrieval of several 
known differential interactions with non-coding disease-causing 
mutations in a single experiment is highly encouraging. 

A particularly powerful feature of the eYIH approach is that 
it uniquely enables the comparison of wild-type and mutant 
TFs or regulatory elements, in a single experiment and in a 
high-throughput manner. Our findings show that both coding 
mutations in TF-encoding genes and non-coding mutations in 
regulatory sequences can result in rewiring of GRNs. While 



(H) Number of interactions iost or gained invoiving activators (A), repressors (R) or bifunctionai TFs (A/R, activators and repressors) for mutations that cause 
increased or decreased target gene expression. Oniy interactions in which the TF is co-expressed with the target gene in disease-reievant tissue, or associated 
with a simiiar disease or phenotype are shown. Statisticai significance was determined using Fisher’s exact test. 

(i and J) Exampies of differentiai eY1 H interactions with the HBB promoter (i) and the promoter of the CYBB gene (J). Disease-associated mutations are indicated 
in red. Reported TF binding site iogos are shown (Weirauch et ai., 2014). 

See aiso Tables S5 and S6. 
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Figure 7. Mutations in the Limb SHH Enhancer 

(A) Summary of interactions lost (red) or gained (blue) for different mutations in the ZRS enhancer of sonic hedgehog. Yellow circles, TFs expressed in limb during 
development; black dots, activators; white dots, repressors; black/white dots, TF that can be both activators or repressors. 

(B) Number of interaction changes occurring with limb-expressed activators, repressors or bifunctional TFs (activators/repressors) for interactions gained or lost 
in ZRS enhancer mutations, p = 0.018 by Fisher’s exact test. 

(C) Gain of interactions detected by eY1H assays in the 105C ^ G mutant in the ZRS enhancer of sonic hedgehog. Blue boxes indicate differential positive 
interactions. 

(D) DNA binding motifs for TFAP2A, TFAP2B, and TFAP2E discriminate wild-type and mutant enhancer sequences. 

(E) HEK293T cells were co-transfected with enhancer fragments containing wild-type (1 05C) or mutant (1 05G) sequences cloned upstream of a Firefly luciferase 
reporter vector and the indicated TF expression vectors. Relative luminescence activity is plotted as fold change compared to cells co-transfected with control 
vector expressing GFP. Experiments were performed four times in three to six replicates. Average relative luminescence activity ± SEM is plotted. *p < 0.05 by 
Student’s t test. 

See also Table S5. 



coding mutations cause mostly loss of protein-protein interac- 
tions (Sahni et al., 2015, in this issue of Cell), protein-DNA 
interaction changes caused by either coding or non-coding mu- 
tations involve both gain and loss of interactions, sometimes with 
the same mutation (Sahni et al., 201 5). We provide a guide for in- 
terpreting eYIH data with non-coding disease-causing muta- 
tions. Specifically, we would prioritize differential interactions 



involving TFs that are co-expressed with the target gene, in the 
disease-relevant tissue. Further, we emphasize that concordant 
interactions, for instance increased gene expression and gain of 
interaction with an activator or loss with a repressor, are more 
likely relevant in vivo than other interactions. Obtaining addi- 
tional, high-resolution gene expression and TF function data 
will be critical for the continued integration not only of eYIH 
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data, but also of interaction data inferred by DNase I hypersen- 
sitivity assays or predicted based on TF binding sites. 

Most variants identified by GWAS reside in non-coding re- 
gions of the genome (Hindorff et al., 2009; Maurano et al., 
201 2). We propose that eYI H assays will provide a facile method 
with which differential TF interactions involving these variants 
can be analyzed. Overall this work provides an initial blueprint 
to study enhancer networks, as well as to determine how 
network connectivity is affected in disease. 

EXPERIMENTAL PROCEDURES 
eY1H Assays 

Enhanced yeast one-hybrid (eY1 H) assays were performed as described (Re- 
ece-Hoyes et ai., 2011b). This method detects protein-DNA interactions and 
invoives two components: a “DNA-bait” (e.g., a gene promoter or enhancer) 
and a “TF-prey.” We generated DNA-bait strains for 360 human deveiop- 
mentai enhancers seiected from the Vista Enhancer Browser (http:// 
enhancer.ibl.gov; Table SI). Enhancers (0.4-2. 4 kb) were amplified by PGR 
(Table S7) from human genomic DNA (Clonetech) and were then Gateway- 
cloned (Reece-Hoyes et al., 2011b). Entry clones were sequenced using 
PacBio (Yale Center for Genomic Analysis; Table S8). The DNA-baits were 
cloned upstream of two Y1H reporter genes {LacZ and HIS3) and both DNA- 
bait:: reporter constructs were integrated into the yeast genome to generate 
chromatinized “DNA-bait strains.” Yeast strains that express different TFs 
fused to the activation domain (AD) of yeast Gal4 were mated into the DNA 
bait strain. If a TF binds the regulatory region, the AD moiety activates reporter 
gene expression. LacZ activation was detected via the conversion of colorless 
X-gal into a blue compound, while His3 expression allows the yeast to grow on 
media lacking histidine and to overcome the addition of 3-amino-triazole (3AT), 
a competitive inhibitor of the His3 enzyme (Deplancke et al., 2004; Reece- 
Hoyes and Walhout, 2012). We updated the previously published arrayed 
collection of 988 human TFs (Reece-Hoyes et al., 2011a) by adding 146 TFs 
and removing 48 for which the clone turned out to be incorrect, was truncated 
or did not encode the DNA binding domain. The resulting collection contains 
one variant of 1 ,086 full-length TFs (76% of all 1 ,434 human TFs, Table S9). 

eYI H assays were performed using a Singer robot that manipulates yeast 
strains in a 1,536-colony format. Images of readout plates lacking histidine 
and containing 3AT and X-gal were processed using the Mybrid web-tool to 
automatically detect positive interactions (Reece-Hoyes et al., 2013). Each 
interaction was tested in quadruplicate and only those that were positive at 
least twice were considered genuine (Reece-Hoyes et al., 2011b). However, 
the vast majority of interactions detected (-^90%) were supported by all four 
colonies as previously published (Reece-Hoyes et al., 201 1 b). Interactions de- 
tected by Mybrid were then manually curated. False positives detected by 
Mybrid on plates with uneven background were removed. We included false 
negative interactions missed by Mybrid, for instance because they occur 
next to very strong positives or occur with baits that exhibit high background 
reporter gene expression. Positive colonies were sequenced to determine 
prey identity. Fourteen quads in the array were removed from the interaction 
list as they did not match the expected TF (see Extended Experimental Proce- 
dures). A total of 2,230 high-quality protein-DNA interactions between 246 en- 
hancers and 283 TFs were included in the final data set (Table S2). 

Target Profile Similarity 

Target profile similarity between TFs was calculated using the Jaccard index 
as the number of enhancer targets shared between two TFs A and B divided 
by the number of enhancers that interact with either A or B (Fuxman Bass 
et al., 2013). Target profile similarities range from 0 to 1, with 0 indicating no 
target overlap and 1 indicating complete target overlap. 

Mutated Regulatory Regions 

Mutant DNA baits were generated by introducing mutations in the primers in 
the PGR step prior to generating entry clones (Table S10). Yeast DNA-bait 
strains were sequenced to verify the mutation and ensure the absence of addi- 



tional mutations. eYIH screens were performed for two or three independent 
yeast strains per construct. Interactions that occurred with at least two out of 
three or two out of two of the strains were considered positive while interac- 
tions not occurring in any of the strains were considered negative (Table S5). 
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SUMMARY 

Modulation of protein function is used to intervene 
in cellular processes but is often done indirectly by 
means of introducing DNA or mRNA encoding the 
effector protein. Thus far, direct intracellular delivery 
of proteins has remained challenging. We developed 
a method termed iTOP, for induced transduction 
by osmocytosis and propanebetaine, in which a 
combination of NaCI hypertonicity-induced macropi- 
nocytosis and a transduction compound (propane- 
betaine) induces the highly efficient transduction of 
proteins into a wide variety of primary cells. We 
demonstrate that iTOP is a useful tool in systems in 
which transient cell manipulation drives permanent 
cellular changes. As an example, we demonstrate 
that iTOP can mediate the delivery of recombinant 
Cas9 protein and short guide RNA, driving efficient 
gene targeting in a non-integrative manner. 

INTRODUCTION 

Modulation of protein function is a powerful means of inter- 
vention in disease. Protein manipulation is usually achieved 
indirectly, at the DNA or RNA level, either by “knockdown” or 
mutation of the encoding gene or by ectopic overexpression of 
wild-type or mutant genes. Transient, non-integrative modula- 
tion of cell function by direct intracellular delivery of proteins 
has appealing application, both in research and the clinic. How- 
ever, the currently available toolset for the intracellular transduc- 
tion of proteins is limited. 

In 1982, Okada and Rechsteiner reported that brief hypertonic 
shock followed by a hypotonic treatment can induce the intra- 
cellular uptake of proteins into cells (Okada and Rechsteiner, 
1982). Unfortunately, this technique proved limited to immortal- 
ized cell lines and yielded poor protein transduction efficiencies 
and poor cell survival in primary cells. The discovery of cell- 
penetrating peptides (CPPs) sparked renewed interest in pro- 
tein-mediated cell manipulation. Independent discoveries from 
Green and Frankel demonstrated that the HIV TAT protein can 
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transduce itself across the cell membrane (Schwarze et al., 
2000). Nagahara and colleagues subsequently demonstrated 
that TAT-peptide-mediated protein transduction also worked 
when the TAT peptide was cloned as an in-frame fusion to a 
recombinant protein of interest, providing a tractable method 
for the transduction of recombinant proteins (Schwarze et al., 
2000). CPP-mediated protein transduction appears to work 
with all cell types, but the dependence of physical fusion of the 
GPP with the cargo protein can disrupt protein function or alter 
the subcellular localization of the fusion protein (Lundberg and 
Johansson, 2001). Thus, the success of CPP-driven protein 
transduction is variable and dependent on the nature and phys- 
ical properties of the protein cargo. 

Here, we describe a method for the efficient delivery of native 
proteins and other macromolecules, such as small RNAs, into 
primary cells. We discovered that a combination of small mole- 
cules drives the highly efficient intracellular delivery of native 
proteins, independent of any transduction peptide. We termed 
this process “iTOP” for induced transduction by osmocytosis 
and propanebetaine. iTOP is an active uptake mechanism in 
which an NaCI-mediated hyperosmolality, in combination with 
a transduction compound (a propanebetaine), triggers macropi- 
nocytotic uptake and intracellular release of extracellularly 
applied macromolecules. We demonstrate that iTOP allows 
the highly efficient delivery of recombinant cytoplasmic and 
nuclear proteins into a wide variety of primary cell types. Finally, 
we demonstrate that iTOP of recombinant Cas9 protein and 
in-vitro-transcribed short guide RNA provides a highly efficient 
and non-integrative means of gene editing. 

RESULTS 

Transduction of Native Protein Independent 
of a Ceil-Penetrating Peptide 

Protein transduction provides an attractive means to transiently 
manipulate cell behavior without risk of permanent changes to 
the cell’s genome. We set up a system for protein transduction 
into cells by generating recombinant Oct4-VP16 protein with 
an N-terminal histidine tag (H6) and a C-terminal poly-arginine 
GPP (R1 1-GPP), to drive self-transduction of the protein across 
the cell membrane (Zhou et al., 2006) (Figure 1A). To validate 
the system, we also generated recombinant Oct4-VP16 protein 

CrossMark 





Cell 



without the R1 1 -CPP or without both the R1 1 -CPP and histidine 
purification tags (Figure 1 A, top). We used a firefly luciferase re- 
porter plasmid containing six tandem Oct4 binding sites (Tomilin 
et al., 2000) to measure intracellular activity of transduced Oct4 
protein (Figure 1A, bottom). Luciferase activity was measured 
12 hr after adding the recombinant Oct4 protein. Transduced 
H6-Oct4-VP16-R11 protein activated the Oct4-luciferase re- 
porter in a dose-dependent manner (Figure IB). Surprisingly, 
addition of the H6-Oct4-VP16 protein (without R11-CPP) or 
Oct4-VP16 protein (without R11-CPP and H6 tag) elicited the 
same response (Figure 1 B). This finding was highly unexpected, 
yet the dose-dependent activation of the Oct4 reporter sug- 
gested that Oct4 protein was incorporated into the cells inde- 
pendent of the CPP sequence. 

We hypothesized that one or more components of the buffer 
in which we purified the recombinant protein were responsible 
for the CPP-independent protein transduction. To test this, we 
examined the effect of omitting individual components of the 
buffer on Oct4 transduction. As shown in Figure 1C, Oct4-lucif- 
erase reporter activation was abrogated when either NaCI or 
non-detergent Sulfobetaine-201 (NDSB-201) was omitted from 
the buffer, indicating that a combination of NaCI and NDSB- 
201 is required for the introduction of Oct4 protein into cells. 
Because omission of either compound could potentially affect 
Oct4 protein solubility, it is important to note that the absence 
of NaCI or NDSB-201 did not result in Oct4 protein precipitation 
from the solution (data not shown). To exclude the possibility that 
luciferase reporter activation occurred through an Oct4-protein- 
independent manner, we also analyzed the effect of the Oct4 
protein on a luciferase reporter without Oct4 binding sites. As 
shown in Figure SI A, the Oct4 protein did not activate the lucif- 
erase reporter without Oct4 binding sites, demonstrating that 
the observed effect was indeed dependent on binding of the 
transduced Oct4 protein to the target sites. Oct4 specificity was 
further confirmed using a tandem Oct4-Sox2 reporter (Boyer 
et al., 2005) and recombinant Oct4 protein synergized with 
Sox2, again demonstrating its functional specificity (Figure SI B). 

Effect of Osmolality, NDSB-201, Time, and Protein 
Concentration 

To further examine the protein transduction parameters, we set 
up a more direct detection system to quantify transduced 
protein. We detected intracellular beta-lactamase, a small 
highly soluble protein (Figures SIC and SID), using CCF2/AM, 
a non-fluorescent lipophilic substrate that can readily cross the 
cell membrane (Figure 1D, see Experimental Procedures). 
Once in the cytosol, CCF2/AM is cleaved by cytosolic esterases, 
which activate its fluorescence and leave the now negatively 
charged form, CCF2, trapped inside the cell. Upon excitation 
at 409 nm, CCF2 emits (green) light at 520 nm. Cleavage of 
CCF2 by intracellular beta-lactamase results in a shift in 
the emission wavelength to blue (447 nm). Thus, the ratio of 
blue versus green signal accurately quantifies intracellular 
beta-lactamase. 

Using this assay, we explored the effect of time and/or NaCI, 
NDSB-201, and protein concentration on protein transduction 
of murine embryonic fibroblasts (MEFs). We discovered that 
optimal transduction time was directly proportional to the 



level of NaCI-induced hyperosmolality, with higher osmolalities 
resulting in faster protein transduction. Figure IE integrates 
the effect of transduction time and different media osmolalities 
on beta-lactamase protein transduction. As control, beta- 
lactamase was transduced in isotonic media with addition of 
NDSB-201 (open squares). At 800 mOsmol/kg, transduction 
occurred rapidly (orange line) with optimal transduction time at 
1.5 hr. In contrast, at 500 mOsmol/kg, transduction proceeded 
slowly, reaching optimal levels after 12 hr (red line). Intermediate 
osmolalities resulted in corresponding optimal transduction 
times as indicated (red and green lines). Extension of protein 
transduction time beyond the optimal timeframe was detrimental 
to the cells, resulting in lower transduction efficiency and cell 
death (Figure 1 E, dashed lines). 

We next tested whether other hypertonicity-inducing mole- 
cules could mediate protein transduction as well. As shown in 
Figure IF, NaCI, RbCI, KCI, and LiCI all induced protein trans- 
duction at varying levels (Figure 1 F). In addition, Na-gluconate 
supported protein transduction, indicating that the C\~ anion 
in the added NaCI is dispensable for this process. In contrast, 
sucrose-, lactulose-, sorbitol-, and mannitol-induced hyperto- 
nicity did not support beta-lactamase protein transduction (Fig- 
ure 1 F). The above data demonstrate that protein transduction 
is critically dependent on hypertonicity induced by alkali metal 
cation-containing salts, especially Na^ and Rb"^. 

Next, we explored the effect of the NDSB-201 concentration 
on the beta-lactamase transduction. As shown in Figure 1G, 
beta-lactamase transduction was NDSB-201 dependent and 
most efficient at an NDSB-201 concentration of 10-25 mM 
NSDB-201 . At high concentrations, NDSB-201 displays some 
toxicity, resulting in a decrease in protein transduction 
(Figure S1E). 

Finally, we measured CCF2 cleavage as a function of beta- 
lactamase concentration. As shown, transduction of increasing 
concentrations of beta-lactamase protein for 3 hr in the presence 
of 25 mM NSDB-201 and NaCI-adjusted media (osmolality of 
700 mOsmol/kg) resulted in increased intracellular beta-lacta- 
mase activity (Figure 1FI, bars). Beta-lactamase transduction 
was not observed in the isotonic controls in the presence of 
25 mM NSDB-201 (Figure 1H, open circles). 

Altogether, this demonstrates that recombinant beta-lactamase 
protein is incorporated into the cell and released into the cyto- 
plasm and is dependent on hypertonicity induced by Na"^- or 
Rb^-containing salts, NDSB-201 concentration, transduction 
time, and extracellular protein concentration. We termed this 
process “iTOP” (/nduced transduction by osmocytosis and pro- 
panebetaine). Accurate determination of the amount of protein 
transduced into the cytosol is extremely challenging and is depen- 
dent on cell type, protein solubility, half-life, and the concentration 
at which it can be applied to the cells. Indeed, Figure 1 H demon- 
strates a direct relationship between the extracellular concentra- 
tion of beta-lactamase protein and the amount transduced into a 
cell. To provide a reference for the amount of transduced protein, 
we compared iTOP protein transduction to CCP-dependent 
protein delivery or the protein transduction method reported 
by Okada and colleagues. As shown, iTOP delivery is at least 
four times more efficient in transducing protein into primary 
fibroblasts than these previously reported methods (Figure S1F). 
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Figure 1. A Combination of NaCI Hypertonicity and NDSB-201 Induces Transduction of Native Proteins 

(A) Top: schematic representation of the Oct4 recombinant proteins used in this study. Bottom: timeline of Oct4 protein transduction assay. 

(B) Cos-7 cells were transfected with 6xOct4-TK-Firefly-Luc reporter and, 12 hr later, were transduced with increasing amount of Oct4 proteins as indicated. 
Firefly luciferase activity was normalized by co-transfection of Renilla luciferase construct. Controls were cells transduced with an empty or an Oct4-expressing 
lentivirus (white and black bars, respectively). Mean ± SD; n = 3. 

(C) Cos-7 cells were transfected with 6xOct4-TK-Firefly-Luc reporter and incubated with Oct4-VP1 6 protein (Oct4 protein in elution buffer, 1 :10 diluted in culture 
media, black bar) or without protein (white bar). Red bars represent cells transduced with Oct4-VP16 protein in the absence of one of the elution buffer com- 
ponents: A: NaCI (1 M), B: NaH2P04 (50 mM), C: Tris-HCI (50 mM), D: NDSB-201 (250 mM), E: 2-mercaptoethanol (100 pM), F: MgCL 2 (125 pM), and ZnCl 2 
(125 pM). Firefly luciferase activity was normalized with co-transfected Renilla luciferase. Mean + SD; n = 3. 

(D) Schematic representation of the beta-lactamase reporter assay. The cell-permeable CCF2/AM compound is trapped in the cytoplasm by intracellular es- 
terases, which convert it to the non-membrane-permeable, fluorescent CCF2. Excitation of CCF2 at 409 nm results in an emission signal at 520 nm (green signal). 
CCF2 cleavage by intracellular beta-lactamase abrogates intramolecular FRET, resulting in a shift in the emission wavelength to 447 nm (blue signal). 

(legend continued on next page) 
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Protective Osmolytes Rescue Hypertonicity-Induced 
Cell-Cycle Inhibition 

Although the combination of NaCI-induced hyperosmolality and 
NDSB-201 promoted efficient protein transduction, it also affected 
cell proliferation. Hyperosmotic stress is well known to induce cell- 
cycle arrest, followed by apoptosis in mammalian cells (Kultz et al., 
1998). We observed by BrdU incorporation that protein transduc- 
tion for 3 hr at 700 mOsmol/kg or for 12 hr at 500 mOsmol/kg 
reduces MEF proliferation by more than 60% compared to 
untreated cells, independent of the presence of beta- lactamase 
(Figure 2A). Apoptosis, measured by caspases 3/7 activity assay, 
was not detected in transduced cells (data not shown). 

Osmoprotectants (or protective osmolytes) help cells cope 
with osmotic stress by accumulating in the cytosol, thereby 
balancing the osmotic difference between the intra- and extracel- 
lular environment. We tested whether the addition of osmopro- 
tectants to media would prevent cell-cycle arrest during protein 
transduction. MEFs were treated with transduction media alone 
or supplemented with different osmoprotectants, and BrdU 
incorporation was measured as indicated in Figure 2B. In the 
absence of osmoprotectants, incubation of MEFs with transduc- 
tion media reduced the cell proliferation rate to 34% compared to 
non-transduced controls (Figure 2B, red bar). The addition of os- 
moprotectants during transduction ameliorated this cell-cycle 
arrest to various degrees as shown in Figure 2B (green bars). 
The combination of glycerol and glycine was found most 
effective and almost completely prevented the hypertonicity- 
induced cell-cycle arrest, while still allowing protein transduction 
(Figure 2C). Unless otherwise indicated, we therefore included 
glycerol and glycine in subsequent protein transduction 
experiments. 

To explore whether other cell types could similarly be trans- 
duced with minimal effect on cell proliferation, we transduced 
murine embryonic stem cells (mESCs) with beta-lactamase. 
mESCs appeared more sensitive to the hypertonic conditions. 
mESC transduction for 3 hr at 700 mOsmol/kg reduced cell 
proliferation even with added osmoprotectants glycine and 
glycerol (Figure 2D). We therefore examined whether lowering 
NaCI-mediated hypertonicity and extending transduction time 
(as shown in Figure IE) would allow transduction of more 
sensitive cell types. Indeed, transduction of mESCs for 12 hr at 
500 mOsmol/kg resulted in effective beta-lactamase transduc- 
tion and minimal effect on cell proliferation (Figure 2D). In fact, 
it was possible to perform two subsequent rounds of beta- 
lactamase transduction with BrdU incorporation values of 75% 
compared to untreated cells (Figure 2E). Thus, a combination 
of NaCI-mediated hypertonicity, NDSB-201 , glycine, and glyc- 



erol allows the transduction of native proteins with minimal effect 
on cell proliferation. 

Efficient Protein Transduction in Multiple Primary 
Cell Types 

To determine iTOP efficiency and the range of cell types that 
could be transduced using this method, we transduced Cre re- 
combinase protein in mESC and other primary cells. We used 
mESCs containing a single copy of a Cre-activatable GFP-fluo- 
rescent reporter integrated in the Rosa26 locus (Srinivas et al., 
2001). As outlined in Figure 3A, Cre-mediated removal of a 
loxP-flanked stop cassette activates a GFP reporter, allowing 
assessment of the percentage of successfully transduced cells 
in a population. mESCs were transduced with Cre protein at 
500 mOsmol/kg for 12 hr. Increasing concentrations of recombi- 
nant Cre-protein resulted in increased percentage of GFP- 
positive (GFP+) cells (Figure 3B). One round of transduction 
with 10 |iM of Cre protein activated the GFP reporter in 51% 
of the mESCs and increased to 79% after a second round of 
transduction (Figure 3B). To confirm that protein transduction 
did not affect mESC function, we tested the ability of transduced 
mESCs to form chimeras upon injection into recipient blastocyst 
embryos. Figure 3C shows an image of one of the chimeric mice, 
demonstrating that Cre protein transduction does not affect 
mESC pluripotency. Chimeric mice were able to generate 
offspring, demonstrating that Cre-transduced mESCs contrib- 
uted to the germline (Figure 3D). 

We next explored Cre recombinase protein transduction in 
multiple murine primary cell types isolated from mice carrying 
one copy of a lox-mRFP-lox-mGFP reporter. Single Cre protein 
transduction for 12 hr at 500 mOsmol/kg efficiently activated 
the GFP reporter in multiple cell types, including neuronal and 
gut stem cells, dendritic cells, embryonic fibroblasts, glia cells, 
and neurons (Figure 3E). The above experiments demonstrate 
a highly efficient protein transduction method that is applicable 
to many primary cell types. 

We also explored whether human embryonic stem cells 
(hESCs) could be transduced using this method. We used 
HI hESCs stably transduced with a lentiviral Cre-activatable 
reporter containing an active red fluorescence protein (RFP) 
gene flanked with LoxP sites, followed by a stop sequence and 
a (inactive) GFP reporter gene (Figure 3F). Cre-mediated deletion 
of the loxP-flanked RFP-stop segment activates the GFP 
reporter. The HI reporter hESC line was transduced with Cre 
protein for 12 hr at 500 mOsmol/kg. As shown in Figure 3G, 
we obtained 64% and 78% of GFP-positive cells after one and 
two rounds of Cre protein transduction, respectively. 



(E) MEFs were transduced with 1 laM of beta-lactamase protein, 25 mM NDSB-201 at different NaCI-adjusted osmolalities (indicated by the color lines) and for 
varying amounts of time as indicated. Isotonic media (open rhombs) containing NSDB-201 and beta-lactamase protein, but without additional NaCI, were used as 
a negative control. Relative beta-lactamase activity calculated per well is plotted as a function of transduction time. Dotted lines indicate the presence of cell 
death due to prolonged transduction conditions. Mean ± SD, n = 4. 

(F) Analysis of the transduction activity of different hypertonicity inducers. MEFs were transduced for 3 hr with 1 laM beta-lactamase protein at an osmolality of 
700 mOsmol/kg induced by different compounds as indicated (red bars, transduction with NDSB-201 ; open circles, control transduction in the absence of 
NDSB-201). Relative beta-lactamase protein uptake in isotonic transduction media (bar #1) was set at 1. Mean ± SD; n = 3. 

(G) Beta-lactamase reporter assay on MEFs transduced for 3 hr with 1 laM of beta-lactamase protein with different concentrations of NDSB-201 with an osmolality 
of 700 mOsmol/kg induced by NaCI (red bars). The controls were cells treated as before but with isotonic media (white circles). Mean ± SD; n = 3. 

(H) Beta-lactamase reporter assay on MEFs transduced as in (G) with different concentrations of beta-lactamase protein (red bars). The controls were cells 
treated as before in isotonic media (white circles). Mean ± SD; n = 3. 
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Figure 2. Addition of Osmoprotectants to the Transduction Media Ameliorates Hypertonicity-Induced Cell-Cycle Inhibition 

(A) Beta-lactamase assay (red bars) coupled with BrdU incorporation assay (green squares). MEFs were transduced with 25 mM NDSB-201 at different 
osmolalities and time points with or without beta-lactamase as indicated. Beta-lactamase activity was measured relative to cells incubated in isotonic media, 
which were set at 1 . Cell proliferation was measured by BrdU incorporation as described in the methods. BrdU incorporation of untreated cells was set at 100% 
(not shown). Mean ± SD; n = 3. 

(B) MEFs were incubated with transduction media, containing 1 |iM of beta-lactamase protein, 50 mM of NDSB-201, and an osmolality induced by NaCI of 
500 mOsmol/kg (red bar) or with transduction media supplemented with different osmoprotectants as indicated (green bars). The BrdU incorporation 
values of untreated cells (left black bar) were set at 100%. BrdU incorporation values of mitomycin-C-treated cells used as control for cell-cycle arrest were 
set at 0%. Mean ± SD; n = 3. 

(C) Beta-lactamase transduction (red bars) and BrdU incorporation (green squares) in MEFs. Cells were transduced with 25 mM NDSB-201 at different 
osmolalities and time points with or without Beta-lactamase as indicated with addiction of 30 mM of glycerol and 15 mM of glycine as osmoprotectants. 
Beta- lactamase activity was measured relative to cells incubated in isotonic media, which were set at 1 . The BrdU incorporation values of untreated cells and 
mytomycin-C-treated cells were set at 100% and 0%, respectively (not shown). Mean + SD; n = 3. 

(D) Beta-lactamase transduction (red bars) and BrdU incorporation (green squares) in mESCs as in (C). Mean ± SD; n = 3. 

(E) Beta-lactamase transduction (red bars) and BrdU incorporation (green squares) in mESCs after one or two rounds of protein transduction. mESCs were 
transduced once or twice as indicated with a 12 hr interval between transductions. The relative beta-lactamase activity was measured in relation to cells 
incubated in isotonic media, which were set at 1. The BrdU incorporation values of untreated cells and mytomycin C treated cells were set at 100% and 0%, 
respectively (not shown). Mean ± SD; n = 3. 



Essential Structural Features of the Transduction 
Compound 

As shown, the small-molecule NDSB-201 is essential for the 
introduction of native protein into cells. NDSB-201 is part of a 



group of zwitterionic compounds used to reduce protein aggre- 
gation and facilitate protein refolding (Vuillard et al., 1995). Six 
different NDSB molecules are commercially available (Figure 4A). 
To determine the essential chemical properties of NDSB-201, 
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we analyzed the transduction activity of these NDSBs and their 
effect on cell survival. As described before, MEFs were trans- 
duced with beta-lactamase protein for 3 hr with NaCI-adjusted 
osmolality of 700 mOsmol/kg. Beta-lactamase protein transduc- 
tion of NDSB-201 (reference molecule #01) was set as 100%. 
As shown in Figure 4A, all NDSB molecules were capable of 
transducing beta-lactamase protein but with varying efficiencies 
and impact on cell proliferation. Whereas molecules #02 and #03 
performed similar or better than our reference compound, 
molecules #04 and #06 lowered beta-lactamase transduction 
levels and reduced cell proliferation rate; molecule #05 per- 
formed poorly and arrested cell cycle, even in the presence of 
osmoprotectants. 

It is well known that the sulfonate group is a potential cause of 
small-molecule toxicity. Thus, we replaced the sulfonate group 
in molecules #01 and #03 with a carboxyl group (generating 
molecules #09 and #10, Figure 4B) and tested the effect on 
transduction and cell proliferation. As shown in Figure 4B, 
replacement of the suphonate group did not affect protein 
transduction efficiency but further improved cell proliferation, 
reaching values of 95% of BrdU incorporation compared to 
non-transduced controls. 

Further reducing the complexity of compound #09 yielded 
compound #11, in which the permanently positively charged 
quaternary amine group of compound #09 was replaced by a 
primary amine. The resulting molecule is gamma-amino-butyric 
acid (GABA), which exists as a zwitterion at neutral pH and 
thus contains a similar positive and negative charge at the 
molecule’s termini. As shown in Figure 4B, GABA yielded excel- 
lent protein transduction efficiency and minimal effect on cell 
proliferation, with BrdU incorporation rate of 95% compared to 
untransduced controls. The finding that GABA, a well-known 
neurotransmitter, can mediate protein transduction was unex- 
pected and suggested the involvement of GABA signaling in 
this process. However, protein transduction was not affected 
by neither the addition of well-known GABA agonists (Figure S2A) 
nor specific inhibitors of GABA receptor signaling (Figure S2B). 
Moreover, the concentration at which GABA is effective in pro- 
tein transduction (25 mM) is similar to the concentration of 
NDSB-201 and is about 10,000-fold higher than its EC50 as a 
neurotransmitter (±2.5 |iM; Mortensen et al., 2010). Therefore, 
it seemed likely that the physicochemical properties of GABA, 
rather than its role as a signaling molecule, were responsible 
for its protein-transducing activity. As mentioned above, the 
non-detergent sulfobetaine (NDSB) compounds were developed 
for their ability to enhance protein solubility (Vuillard et al., 1 995). 
We therefore tested whether GABA could similarly prevent pro- 
tein aggregation. A particularly insoluble protein is Cas9, an 
RNA-guided nuclease that can be used for specific gene editing, 
which we will describe in more detail below. Production and 
purification of recombinant Cas9 protein proved particularly 
challenging and required a higher concentration of NDSB-201 
to prevent the protein aggregation and precipitation. To test 
whether GABA was similarly able to promote protein solubility, 
we made a dilution series of either NDSB-201 or GABA and 
added recombinant Cas9 protein at a final concentration of 
10 |iM. NDSB-201 efficiently prevented protein precipitation 
at a concentration of 200 mM. As shown, GABA could substitute 



NDSB-201 and prevent protein aggregation at similar concen- 
trations (Figure 4C). 

At neutral pH, NDSBs and its analogs (Figure 4) arezwitterionic 
compounds that have a negatively charged hydrophilic group, 
a short 3-carbon hydrophobic chain and a positively charged 
amine terminus with various possible substituents. The hydro- 
phobic middle domain is too short to form micelles, and thus, 
NDSBs are not considered detergents (Vuillard et al., 1995). 
To identify the minimal essential structure necessary for trans- 
duction, we tested the beta-lactamase protein transduction ac- 
tivity of several analogs of NDSB and/or GABA (Figure 4D). We 
analyzed the importance of the charged amine and sulfonate/ 
carboxyl termini of the molecules by removing these groups 
from the structure. The analogs lacking the amine (#07 and 
#12, Figure 4D) or the sulfonate/carboxyl groups (#08 and #13, 
Figure 4D) yielded very poor protein transduction levels and 
reduced cell proliferation, although cell survival and beta-lacta- 
mase protein solubility were not affected (data not shown). 
These data demonstrate that the presence of the amine and sul- 
fonate/carboxyl groups at the transduction compound termini is 
essential for protein transduction. Finally, we evaluated the 
optimal distance between the amine and sulfonate/carboxyl 
groups by varying the length of the carbon chain in molecules 
#10 and #11 (Figure 4E). As shown in Figure 4E, a deleterious 
effect on protein transduction was observed when the carbon 
chain was shorter or longer than three carbons. Above data 
demonstrate that the minimal structure necessary to allow effi- 
cient protein transduction is a zwitterionic molecule, which, at 
neutral pH, consists of a positively charged amino group and a 
negatively charged sulfonyl or carboxyl group separated by a 
three-carbon chain. To assure that the reduced transduction 
efficiencies observed by some of the compounds were not the 
result of poor protein solubility, we analyzed beta-lactamase 
protein precipitation in the presence of different betaine com- 
pounds. With the exception of the control sample, in which pro- 
tein precipitation was induced by adding ethanol, the different 
transduction compounds mentioned above did not induce 
beta-lactamase protein precipitation (Figure S2C). 

Dissecting the Mechanism of Protein Transduction 

Because proteins are too large to diffuse through the plasma 
membrane, we suspected that protein transduction involved 
an active transport mechanism. Extracellular particles can enter 
via several distinct endocytic pathways: dynamin-dependent 
endocytosis, which is further subdivided in clathrin- and caveo- 
lae-mediated endocytosis, and dynamin-independent endocy- 
tosis, which includes macropinocytosis, the uptake of large 
(0.5-5 |im) vesicles containing gulps of extracellular fluid. We 
used specific inhibitors of these endocytic pathways to deter- 
mine the mechanism of protein uptake. MEFs were incubated 
with inhibitors of dynamin-, clathrin-, or caveolin-mediated 
endocytosis or macropinocytosis for 1 hr prior to transduction 
with beta-lactamase protein. As shown in Figure 5A, inhibition 
of dynamin-, clathrin-, or caveolin-mediated endocytosis did 
not affect beta-lactamase uptake. In contrast, the macropinocy- 
tosis inhibitors 5-(N-Ethyl-N-isopropyl)amiloride (EIPA) and 
5-(A/,A/-Dimethyl)amiloride (DMA) resulted in a profound reduc- 
tion in beta-lactamase transduction. Macropinocytosis requires 
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active rearrangements of the actin cytoskeleton, which is 
mediated by the small GTPases Rad and CDC42 through acti- 
vation of the downstream effector kinase Pak1 . As expected, 
cytochalasin D or latrunculin A, which potently block actin 
polymerization, inhibited beta-lactamase transduction (Fig- 
ure 5A). In addition, specific inhibitors of RAC1, CDC42, and 
Pak1 alone or in combination efficiently block beta-lactamase 
transduction in MEFs (Figures S3A and S3B). Altogether, these 
results indicate that protein transduction is mediated through 
macropinocytosis. 

Macropinocytosis is regulated by a family of NaVFI'^ antiport- 
ers, which are targeted by the EIPA and DMA inhibitors 
described above. These NHE antiporters are rapidly activated 
in response to a wide variety of extracellular stimuli, including 
hypertonicity induced by Na'^-containing salts. Nhe1 (Slc9A1) 
is a ubiquitously expressed member of this family and therefore 
is a likely candidate to be involved in the protein transduction 
process. To determine the role of Nhe1 in protein transduction, 
we analyzed beta-lactamase protein transduction in Nhe1 
mutant MEFs. Heterozygous or homozygous deletion of Nhe1 
resulted in a profound reduction in intracellular beta-lactamase 
activity, demonstrating that Nhe1 plays an important role in pro- 
tein transduction (Figure 5B). The residual protein transduction 
activity observed in the absence of Nhe1 function is likely a 
redundant effect of other members of the Nhe antiporter family. 

Several growth factor activators of tyrosine kinase signaling 
were shown to stimulate macropinocytosis. Therefore, we 
examined whether the addition of growth factors could enhance 
intracellular delivery of beta-lactamase protein. As shown in 
Figure 5C, EGF, bFGF, PDGF, IGF, and insulin all enhanced 
beta-lactamase transduction, and combinations demonstrated 
an additive effect. 

The above data demonstrate that ITOP protein transduction 
occurs through macropinocytosis uptake of extracellularly 



applied protein, which is subsequently released from the inter- 
nalized macropinosomes. To quantify the differential roles of 
NaCI hypertonicity and the transduction compound in protein 
uptake and intracellular release, we set up two imaging-based 
assays. 

Macropinocytosis can be quantified by measuring the uptake 
of fluorescently labeled high-molecular-weight dextran (Fig- 
ure 5D; Commisso et al., 2014). To determine whether dextran 
carbohydrate uptake and protein uptake followed the same 
path, MEFs were co-transduced with red fluorescent dextran 
(TMR-dextran) and far-red fluorescent BSA protein (BSA- 
Alexa647) for 1 hr at 700 mOsmol/kg. As shown in Figure 5D, 
all dextran-positive macropinosomes contained BSA and vice- 
versa, demonstrating that the uptake of TMR-dextran and pro- 
teins proceed via the same mechanism (Figure 5D). Moreover, 
the simultaneous uptake of Dextran and BSA was blocked by 
the macropinocytosis inhibitor EIPA (Figure 5D). Together, these 
data demonstrate that the recently described TMR-dextran 
assay for the quantification of macropinocytosis (Commisso 
et al., 2014) can be utilized to accurately monitor the macropino- 
cytotic uptake step in the iTOP process. 

To quantify the release of internalized macropinosomes, we 
used a galectin3-fluorescent reporter system that has been 
described earlier to monitor vesicle leakage induced by drugs 
or pathogens (Paz et al., 2010; Ray et al., 2010). Galectin-3 is a 
small soluble cytosolic protein that can bind betagalactoside 
sugar-containing carbohydrates. These are normally present 
only on the exterior of the plasma membrane and the interior of 
intracellular endocytic vesicles. Rupture of internalized vesicles 
results in galectin-3 relocalization and accumulation at the inter- 
nal vesicle membrane. Fusion of galectin-3 to a monomeric 
green fluorescent protein (mAG-GAL3) allows visualization of 
galectin-3 relocalization and has been used as a tool to monitor 
vesicle rupture during pathogen infection (Paz et al., 2010; Ray 



Figure 3. Efficient Cre Protein Transduction in Multiple Primary Stem and Differentiated Cells 

(A) Schematic representation of the Cre reporter in mESCs. A singie copy of a ioxP-Stop-ioxP-GFP reporter was inserted in the Rosa26 iocus. Excision of the Stop 
cassette by Cre-recombinase protein induces GFP expression. 

(B) Left: FACS anaiysis of a dose response curve of mESCs transduced with Cre at different concentrations as indicated (paneis 1-4) or after two rounds of Cre 
transduction (panei 5). Right: fluorescence microscopy image of mESCs treated with two rounds of Cre transductions as described above. Dashed lines indicate 
the border of each colony. Scale bar, 50 |im. 

(C) Left: schematic representation of the mouse chimera assay. mESCs were injected into host blastocyst embryos, and mESC contribution to resulting chimeras 
is assessed by coat color. Cre-protein transduced GFP-positive mESCs derived from agouti (FI BL6/129Sv) mice (brown hair color) were injected into C57BL/6 
host blastocyst (black hear color). Brown hair in the resultant pups indicates chimera contribution of the injected ESCs. Right: image of litter with chimera. 

(D) Left: schematic representation of the germline transmission assay. Chimera in C was mated with a C57BL/6 female. Agouti coat color of resultant pups 
demonstrates the ESC origin of the germ cells. Center: image of litter with two agouti pups demonstrating germline transmission of the injected ESCs. Right: 
FACS analysis of GFP expression in blood cells of the pups depicted in the central panel. The agouti pups (#3 and #4) display GFP expression, whereas the black 
pups (#1 and #2) are negative. 

(E) Left and middle: Cre protein transduction of multiple cells types derived from loxP-mRFP-loxP-mGFP reporter mice. Representative fluorescence and phase 
contrast images of Cre transduced cells. The control was cells incubated with transduction media without Cre protein. Scale bar, 50 lam. Percentage of GFP- 
positive cells was determined by flow cytometry. Right top: fluorescence microscopy images of Cre transduced glia cells. As control, cells were incubated with 
transduction media without Cre protein. Left: GFP expression. Middle: expression of GFAP, a glia-specific marker. Right: merge. Scale bar, 50 |am. Right bottom: 
fluorescence microscopy images of Cre transduced neural cells derived from loxP-stop-loxP-GFP mESCs. As control, cells were incubated with transduction 
media without Cre protein. Left: GFP expression. Middle: expression of TuJI, a neural cell-specific marker. Right: merge. Scale bar, 50 ^im. 

(F) Schematic representation of the Cre recombinase reporter. A lentiviral EF1 a-loxP-RFP-loxP-GFP/ires-PuroR construct was stably introduced into hESCs. Cre 
excises the loxP-flanked RFP-STOP cassette and switches fluorescence from red (RFP) to green (GFP). 

(G) Left: fluorescence and phase contrast images of Cre-transduced human ESCs. hESCs were transduced once or twice with Cre protein as indicated. Control 
was cells incubated with transduction media without Cre protein. Left row, RFP expression; middle row, GFP expression; right row, phase contrast. Scale bar, 
50 i^m. Right: flow cytometry analysis of Cre-transduced human ESCs. The density plots show the percentage of cells expressing RFP (upper left area) or GFP 
(lower right area). Double-positive cells (upper right area) result from multiple integrations of the lentiviral reporter. Total percentage GFP expressing cells is shown 
in the histogram plots. 
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et al., 2010). Upon mAG-GAL3 binding to carbohydrates in the 
interior of ruptured vesicles, a multimer complex is formed with 
intense green fluorescence (Figure 5E). Transduction of MEFs 
expressing mAG-GAL3 protein for 1-3 hr at 700 mOsmol/kg 
resulted in the appearance of bright green vesicles, demon- 
strating leakage of the internalized macropinosome membrane 
under iTOP conditions. mAG-GAL3 vesicles were not formed 
when cells were pre-incubated with macropinocytosis inhibitor 
EIPA or under isotonic conditions (Figure 5E). These results 
demonstrate that iTOP conditions promote protein uptake 
from the extracellular space via macropinocytosis and induce 
macropinosome vesicle leakage to release proteins into the 
cytosol. 

Using the above assays to quantify uptake and release, we 
measured beta-lactamase protein transduction, macropinocy- 
tosis, and vesicle leakage (Figure 5F). As expected, protein 
transduction measured by beta-lactamase assay was only 
observed in cells treated with hypertonic media in the presence 
of either NDSB-201 or GABA transduction compounds (Fig- 
ure 5F, blue bars). Control compounds with a shorter carbon 
chain, glycine, and glycine-betaine, did not mediate beta-lacta- 
mase transduction even though macropinocytosis was induced 
in all hypertonic conditions (Figure 5F, red bars), indicating that 
NaCI hypertonic media alone are sufficient to efficiently induce 
macropinocytosis. Transduction compound alone, in the 
absence of NaCI-mediated hypertonicity, did not induce macro- 
pinocytosis (data not shown). The combined data on beta-lacta- 
mase transduction and macropinocytosis suggested that the 
transduction compounds were responsible for the release of 
protein from the internalized macropinosomes. Indeed, the 
mAG-GAL3 reporter assay demonstrated that intracellular mac- 
ropinosome leakage only occurred in the presence of NDSB-201 
or GABA (Figure 5F, green bars). When macropinosome leakage 
occurred, the vesicle contents were completely released into the 



cytoplasm (data not shown). Images of the mAG-GAL3 assay on 
the various compounds used in Figure 5F are shown in Fig- 
ure S3C. The macropinocytosis inhibitor EIPA efficiently blocked 
both beta-lactamase protein transduction, macropinocytosis, 
and the appearance of mAG-GAL3-positive vesicles. Together, 
these data indicate that NaCI hypertonicity induces macropino- 
cytosis-mediated uptake of protein from the extracellular space, 
whereas NDSB-201 or GABA molecules promote intracellular 
macropinosome leakage, allowing the release of the vesicle 
content into the cytosol. 

Protein-Mediated Gene Editing 

The high efficiency of iTOP has appealing application in systems 
in which transient cell manipulation can elicit a binary cellular 
effect or response. The recently discovered CRISPR-Cas9 sys- 
tem consists of the Streptococcus Pyogenes Cas9 nuclease pro- 
tein, which is guided to specific genomic loci by a small guide 
RNA (sgRNA) (Charpentier and Doudna, 2013). The Cas9 
nuclease creates a double-strand break at the target locus, 
which, when repaired through non-homologous end joining 
(NHEJ), frequently results in gene disruption by the resulting frame 
shift mutation. Due to its elegant simplicity, the CRISPR-Cas9 
system has quickly become a popular tool for gene editing. Flow- 
ever, application of this system by means of DNA or RNA trans- 
fection is inefficient in primary cells and typically requires marker 
selection of transfected cells. We explored whether iTOP of 
recombinant Cas9 protein and its guide RNA can offer an alterna- 
tive, more efficient means of applying this gene-editing system. 

As shown above, Cas9 protein requires high concentrations 
of both salt (500 mM NaCI) and the transduction compound 
(250 mM) to remain soluble (Figures 4C and S4A). We noticed 
that, under this condition, the NSDB-201 transduction com- 
pound was toxic to the cells, but GABA was well tolerated 
(Figures S4B and S4C). At this tonicity (1,250 mOsm/Kg), 



Figure 4. Structure-Activity Analysis of Protein Transduction Compounds 

(A) Top: chemical structures of tested non-detergent sulfobetaines. Compound numbers are indicated below the structure. Bottom: beta-lactamase and 
BrdU incorporation assays using the different transduction compounds as indicated. MEFs were transduced for 3 hr with transduction media containing a 
NaCI-adjusted osmolality of 700 mOsmol/kg, 1 laM Beta-lactamase protein, 30 mM of Glycerol, 15 mM of Glycine, and 25 mM of the indicated transduction 
compounds. Beta-lactamase incorporation values of cells treated with transduction media with the reference compound (NDSB-201 , #01) were set as 100%, 
and values of cells treated with isotonic transduction media were set as 0%, as described in the Experimental Procedures. Open circles indicate relative 
BrdU incorporation by the transduced cells. BrdU incorporation of untransduced cells was set at 100%, and BrdU incorporation of mitomycin-C-treated cells was 
set at 0%. Mean ± SD; n = 3. 

(B) Top: chemical structures transduction compound analogs. First row: tested analogs with sulfonic group. Second row: analogs with carboxy group. Compound 
numbers are indicated below the structure. Bottom: beta-lactamase and BrdU incorporation assays using the different transduction compounds as indicated. 
MEFs were transduced as described in (A). Beta-lactamase activity and BrdU incorporation values were analyzed as in (A). Mean ± SD; n = 3. 

(C) Images demonstrating the effect of increasing concentrations of the transduction compounds NSDB-201 or GABA on Cas9 protein solubility. From left to right, 
increasing transduction compound concentration (mM). Rows are different transduction compounds. Scale bar, 50 ^im. 

(D) Top: chemical structures of transduction compounds analogs. Compounds in left columns contain amine and sulfonate or carboxy group. Central column 
shows analogs without amine group. Right column shows analogs without sulfonate or carboxy group. Compound numbers are indicated below the structure. 
Bottom: beta-lactamase and BrdU incorporation assays using the different transduction compounds as indicated. MEFs were transduced with beta-lactamase 
protein and beta-lactamase activity and BrdU incorporation were analyzed as in (A). Beta-lactamase activity of the reference compounds (NDSB195, #06 and 
GABA, #11) was set at 100%. Values derived from compounds #07 and #08 were referred to compound #6. Values derived from compounds #12 and #13 were 
referred to compound #1 1 . Beta-lactamase activity of cells treated with isotonic transduction media was set as 0%. BrdU incorporation is shown as open circles 
as in (A). Mean ± SD; n = 3. 

(E) Analysis of the role of the carbon-chain length. Structures are examples of two transduction compounds (reference molecules #1 0 and #1 1 ) with carbon-chain 
length variations of these. Bottom: beta-lactamase and BrdU incorporation assays using the different transduction compounds as indicated. MEFs were 
transduced with beta-lactamase protein, and beta-lactamase activity and BrdU incorporation were analyzed as in (A). Beta-lactamase incorporation of the 
reference compounds (#10 and #11) was set at 100%. Beta-lactamase activity of cells treated with isotonic transduction media was set as 0%. Values derived 
of compound #1 4 and #1 5 were referred to compound #1 0. Values derived of compound #1 6, #1 7, and #1 8 were referred to compound #1 1 . BrdU incorporation is 
shown as open circles as in (A). Mean ± SD; n = 3. For more details of the transduction compounds and their analogs, see Table S2. 
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transduction occurs within 60-90 min and is extremely efficient, 
as shown by the transduction of recombinant Cre protein (Fig- 
ures S4D-S4F). Previous reports have demonstrated that hyper- 
tonic stress can cause DNA breaks in inactive regions of the 
genome (Redon and Bonner, 2011). However, our iTOP trans- 
duction conditions include osmoprotectants, which should pre- 
vent DNA damage from occurring. We performed a TUNEL (ter- 
minal deoxynucleotyl transferase dUTP nick end labeling) assay 
to determine the effect of our ITOP hypertonic conditions in the 
presence or absence of osmoprotectants. As expected, the 
osmoprotectants in the ITOP transduction buffer effectively 
prevented hypertonicity-induced DNA damage (Figure S4G). In 
addition to the recombinant Cas9 protein, CRISPR/Cas9 gene 
editing requires intracellular delivery of sgRNAs. To determine 
whether ITOP transduction allows the intracellular delivery of 
RNA molecules as well, we analyzed the effect of siRNA trans- 
duction in knocking down target GAPDH. As shown in Fig- 
ure S4H, iTOP transduction of siRNA resulted in efficient and 
specific knockdown of target GAPDH, demonstrating that, in 
addition to protein, ITOP transduction can be used for the 
delivery of small RNAs as well. 

To explore whether recombinant Cas9 protein and guide 
RNA could be co-transduced into cells under iTOP conditions 
(iTOP-CRISPR/Cas9), we produced recombinant Cas9 protein 
and generated sgRNAs by in vitro transcription from DNA 
templates (Figure 6A). A reporter in which the presence of an 
AAVS1 target sequence produces an out-of-frame non-fluores- 
cent dTomato gene was used to monitor the introduction of 
recombinant Cas9-sgRNA (Figure 6B). CRISPR-Cas9 targeting 



of the AAVS1 sequence results in a frame shift and activates 
dTomato fluorescence. KBM7 cells stably expressing the 
reporter were transduced with Cas9 protein together with the 
corresponding AAVS1 sgRNA. After one round of Cas9-sgRNA 
transduction, 30% of reporter KBM7 cells reestablished 
dTomato protein expression (Figure 6C). Upon a second round 
of Cas9-sgRNA transduction, 56% of the cells became dTomato 
positive. Targeting was specific to the AAVS1 sequence 
because off-target sgRNAs with two nucleotide substitutions 
did not activate the dTomato reporter (Figure 6C). Similar exper- 
iments were performed in HI human embryonic stem cells with 
an observed efficiency of 10% and 26% of dTomato-positive 
cells after one and two rounds of transduction, respectively 
(Figure 6D). The above results demonstrate that iTOP of recom- 
binant Cas9 protein and sgRNA allows efficient and specific 
gene modification in reporter cells. 

To determine the efficiency of the iTOP-CRISPR/Cas9 system, 
we targeted an endogenous gene. We chose to target DPH7 
(WDR85), a gene that was identified as an essential host factor 
for Diphtheria toxin lethality (Carette et al., 2009). Biallelic 
deletion of DPH7 renders human cells resistant to Diphtheria 
toxin-induced cell death, providing a simple and effective means 
of identifying knockout cells and measuring the efficiency of 
biallelic gene knockout upon iTOP-CRISPR/Cas9. We used a 
near-diploid clone of the KBM7 cells to disrupt the DPH7 gene 
(Figure S5A). KBM7 cells were transduced twice with Cas9 pro- 
tein, plus one of six different sgRNAs as are indicated in Fig- 
ure 7A. Seven days after protein transduction, cells were treated 
with diphtheria toxin for 48 hr, after which the number of viable 



Figure 5. Protein Transduction Is Mediated by Macropinocytosis 

(A) MEFs were preincubated for 1 hr and transduced for 3 hr in the presence of small-molecule inhibitors of dynamin-mediated endocytosis (Dynasore), 
Clathrin-mediated endocytosis (Pitstop2 and chlorpromazine), Caveolin-mediated endocytosis (Nystatin), macropinocytosis (EIPA, Ethylisopropylamiloride and 
DMA, Dimethylamiloride), or actin polymerization (Cytochalasin D and Latrunculin A) as indicated. MEFs were transduced for 3 hr with 1 ^iM beta-lactamase 
protein at a NaCI-adjusted osmolality of 700 mOsmol/kg in transduction media containing 25 mM of NDSB-201, 30 mM of Glycerol, and 15 mM of 
Glycine supplemented with small-molecule inhibitors as indicated. Relative beta-lactamase protein uptake in isotonic transduction media (left bar) was set at 1 . 
Mean ± SD; n = 3. 

(B) Role of Nhel in protein transduction. MEFs derived from wild-type, Nhel heterozygous (^^“), and Nhel knockout (“^“) embryos were transduced for 3 hr with 
1 i^M beta-lactamase as in (A). Beta-lactamase transduction values of wild-type cells (bar #4) were set at 100%, and beta-lactamase transduction of wild-type 
cells in isotonic media (Bar #1) was set at 0%. Mean ± SD; n = 3. 

(C) The effect of growth factors on beta-lactamase transduction. MEFs were transduced for 3 hr with 1 ^iM beta-lactamase protein at a NaCI-adjusted osmolality 
of 700 mOsmol/kg as in (A) in the presence of 20 ng/ml of epidermal growth factor (EGF), 20 ng/ml of basic fibroblast growth factor (FGF), 20 ng/ml of platelet- 
derived growth factor (PDGF), 20 ng/ml of insulin growth factor (IGF), 2.5 |ig/ml of insulin, and combinations of growth factors as indicated. Beta-lactamase values 
of cells in transduction media without growth factor (bar #2) were set at 1 00%, and beta-lactamase values of cells in isotonic transduction media (bar #1 ) were set 
at 0%. Open circles indicate relative BrdU incorporation by the transduced cells. BrdU incorporation of untransduced cells was set at 100%, and BrdU 
incorporation of mitomycin-C-treated cells was set at 0%. Mean ± SD; n = 3. 

(D) Left: schematic representation of the macropinocytosis quantification. Right: to assess whether the transduction buffer would permit the simultaneous 
incorporation of proteins and non-protein molecules, we analyzed macropinocytosis-mediated uptake of TMR dextran (red) and fluorescently labeled BSA 
protein (cyan) by GFP-expressing MEFs. Merge of red and cyan gives white by additive color mixture. Transduced cells were incubated at 700 mOsmol/kg with 
25 mM NDSB-201. The macropinocytosis inhibitor, EIPA (ethylisopropylamiloride), inhibits uptake of TMR dextran and BSA protein. Nuclei were stained with 
Hoecht 33342 (blue). Scale bar, 50 i^m. 

(E) Left: schematic representation of the mAG-GAL3 reporter assay. Upon initiation of transduction, extracellularly applied protein is taken up into macro- 
pinosomes (black vesicles). Intracellular disruption of the macropinosome membrane releases the macropinosome content into the cytoplasm and allows entry 
of cytosolic mAG-GAL3 protein, resulting in a bright fluorescent signal (bright green vesicles). Right: mAG-GAL3 cells were incubated with transduction media 
at 700 mOsmol/kg supplemented with 25 mM NDSB-201 with or without the macropinocytosis inhibitor EIPA as indicated. Untreated cells were included as 
negative control. Note that, under transducing conditions (middle), mAG-GAL3 accumulates in the compromised macropinosomes. Scale bar, 200 ^im. 

(F) Measurement of protein transduction activity, macropinocytosis, and macropinosome vesicle leakage of NDSB-201 and examples of derivative compounds. 
Cells were incubated with transduction buffer at 700 mOsmol/kg with different transduction compounds or left untreated, as indicated. Left: relative beta- 
lactamase protein incorporation in MEFs. Beta- lactamase transduction of cells in isotonic transduction media (black bar) was set at 1 . Middle: macropinocytosis 
levels were measured by TMR dextran incorporation in cells treated as described above, and total area of dextran positive vesicles per cell was determined. 
Right: macropinosome leakage was determined by measuring total area of mAG-Gal3-positive vesicles per cell. Mean ± SD; n = 3. 
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Figure 6. Gene Editing Using Simultaneous Transduction of Recombinant Cas9 Protein and sgRNA 

(A) Top: schematic representation of the in-vitro-transcribed sgRNAs containing 20 nucleotides of guide sequence and 80 nucleotides of scaffold sequence. 
Bottom: protein gel of recombinant purified Streptococcus pyogenes Cas9 protein. 

(B) Schematic representation of the CRISPR-Cas9 reporter system. Cells were transduced with a lentiviral vector containing the CRISPR-Cas9 target sequence 
followed by an out-of-frame sequence of dTomato gene. CRISPR-Cas9 induced DNA double-strand break in the target sequence, followed by NHEJ repair that 
induces DNA deletions and/or insertions. Those DNA modifications may restore the dTomato reading frame, producing red cellular fluorescence. DNA sequences 
in red and green represent the “target sequence,” where red and green text represents the “sgRNA binding” sequence and the “protospacer-adjacent motif, 
PAM” sequence, respectively. 

(C) CRISPR-Cas9 reporter KBM7 cells were transduced with Cas9 protein and target sgRNA. Negative control was cells transduced without Cas9 and sgRNA. 
Specificity controls were performed by transducing cells with Cas9 protein and off-target sgRNAs. The percentage of dTomato-positive cells was determined by 
flow cytometry analysis. Bottom shows phase contrast and fluorescent images for indicated conditions. Scale bar, 250 |im. 

(D) CRISPR-Cas9 reporter HI human embryonic stem cells were transduced with Cas9 protein and on-target sgRNA. Negative control was cells transduced 
without Cas9 and sgRNA. Specificity controls were performed by transducing cells with Cas9 protein and off-target sgRNAs. The percentage of dTomato- 
positive cells was determined by flow cytometry analysis. Bottom shows phase contrast and fluorescent images for indicated conditions. Dotted lines delineate 
the border of the hES colony. White scale line represents 50 ^im. 



cells was determined. Samples transduced with Cas9 protein 
and DPH7 sgRNAs yielded high levels of cell survival, whereas 
no viable cells were detected in diphtheria-toxin-treated wild- 
type KBM7 cells or in cells transduced with Cas9 protein with 
a control sgRNA (Figure 7A). DNA sequence analysis on the 
pool of diphtheria-toxin-resistant cells demonstrated DPH7 
gene disruption at the sgRNA target site in all resistant cells (Fig- 
ures 7B and S5B), confirming that diphtheria toxin resistance 
was the result of Cas9-sgRNA targeting. Similar results were 
obtained when H1 human ESCs were targeted with recombinant 
Cas9 protein coupled to DPH7 sgRNAs (Figure 7C). 



To determine the frequency of biallelic DPH7 gene disruption, 
we transduced KBM7 and H1 cells with Cas9 protein and the 
corresponding DPH7 sgRNA as described above. Upon trans- 
duction, single cells were sorted into 384-well plates (Figure 7D). 
After a week, emerging clones were treated with diphtheria toxin, 
and resistant clones were counted to quantify the percentage of 
knockout clones. Four out of six sgRNAs yielded around 70% 
resistant clones, which is a remarkable efficiency considering 
that diphtheria resistance requires biallelic deletion of DPH7 
gene (Figure 7D). Indeed, sequence analysis revealed biallelic 
mutations at the sgRNA target sites in all clones analyzed 
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(Figure 7E, top). Similar results and knockout efficiency were ob- 
tained upon transduction of human ESCs (Figure 7E, bottom). 
Above results demonstrate the particular strength of the iTOP- 
CRISPR/Cas9 system in enabling high-efficiency gene editing 
in primary (stem) cells (Figure S5C). DPH7 knockout clones of 
hESCs retained expression of essential hESC markers, as well 
as the ability to generate derivatives of all three germ layers 
in vitro, demonstrating that iTOP-CRISPR/Cas9 gene knockout 
did not affect stem cell pluripotency (Figure S6). 

Conclusions 

Despite vast improvement in technologies for the delivery of 
DNA or RNA into cells, the manipulation of primary cells often 
remains difficult, with low percentages of targeted cells, poor 
control over insert copy number, and/or unstable gene expres- 
sion levels. The ability to efficiently transduce native proteins 
into primary cells offers new opportunities for direct cell manip- 
ulation without the need for DNA or RNA intermediates. Proteins 
allow cell manipulation in a non-integrative manner and are 
particularly suited in binary systems in which a single, transient 
cell manipulation results in a permanent change in cell function, 
identity, or (epi)genetic state. 

The iTOP system described here is highly efficient and flexibly 
adaptable. Variation in NaCI hypertonicity, type and concen- 
tration of transduction compound, and variation in transduction 
time can be fine-tuned to the needs of the user, the specific target 
cell type, and the biochemical characteristics of the transduced 
protein. Because the amount of transduced protein is directly 
related to the extracellular protein concentration, the system 
allows narrow dosage of the effective intracellular protein levels. 

The CRISPR/Cas9 gene editing has revolutionized our ability 
to modulate the genome and development of a safe and efficient 
means to apply this technology in primary cells allowing the use 
of CRISPR/Cas9 in the modulation of genetic defects. Recent 
reports demonstrate that delivery of recombinant Cas9 protein 
using CPPs, electroporation, or cationic lipids results in effective 
gene editing of immortalized cell lines, but primary cells remain 
a challenge (Kim et al., 2014; Ramakrishna et al., 2014; Zuris 
et al., 201 4). We demonstrate that the iTOP system allows highly 
efficient gene modification upon transduction of recombinant 
Cas9 protein and in-vitro-transcribed sgRNA (iTOP-CRISPR/ 
Cas9). The transient nature of iTOP-CRISPR/Cas9 assures 
that the transduced gene-editing system does not remain 
inside the cell, leaving only the editing event as a permanent 
result of the cell manipulation. The high efficiency to knockout 
genes using protein iTOP has appealing application in research 
and perhaps will allow new therapeutic avenues for the treat- 
ment of genetic disease. In addition to gene editing, protein 
iTOP may have application in other areas, for example in 
the modulation of intracellular signaling pathways, in cell 
differentiation/dedifferentiation, or as adjuvants in dendritic cell 
immunization. 

EXPERIMENTAL PROCEDURES 
Cell Lines 

COS7 cells— ATCC, H1 human ESCs— WiCell, and KBM7 cells were a gift from 
Dr. Brummelkamp, NKI Amsterdam. Mouse embryonic neural stem (NSC) cells 



were derived from E13.5 mice embryos as described by Louis and Reynolds 
(2005). Mouse gut organoids were derived from adult mice as reported by 
Sato et al. (2009). 

Cell Proliferation Assay 

Cell proliferation was determined using the Cell Proliferation ELISA kit, BrdU 
(Roche, 11669915001) following manufacturer’s instructions. For more details, 
see Supplemental Information. 

Transduction Buffer 
5 X Transduction Buffer 

500 mM NaCI, 25 mM NaH2P04, 250 mM NDSB-201 , 1 50 mM glycerol, 75 mM 
glycine, 1 .25 mM MgCl 2 , 1 mM 2-mercaptoethanol at pH 8.0. For more details, 
see Supplemental Information. 

CRISPR/Cas9 Transduction Media 

Opti-MEM media (Life Technologies) supplemented with 542 mM NaCI, 
333 mM GABA, 1.67x N2, 1.67x B27, 1.67x non-essential amino acids, 
3.3 mM Glutamine, 167 ng/ml bFGF2, and 84 ng/ml EGF. For more details 
on media preparation and catalog numbers, see Supplemental Information. 

Recombinant Protein Production 

His-tagged recombinant proteins were produced in E. Coli and purified using 
Ni-agarose affinity chromatography. Detailed methods and procedures are 
described in the Extended Experimental Procedures, as well as Tables SI 
and S5. 

Protein Transduction 

We have established two transduction protocols that work best for most cell 
types and proteins to be transduced, typically yielding transduction effi- 
ciencies of 60%-90%. In the first protocol, transduction is performed for 
12 hr at an osmolality of 500 mOsmol/kg (protocol 12/500). In brief, a day 
before protein transduction, cells were plated in the appropriate culture 
media without antibiotics. Next day, 5x transduction buffer with the protein 
of interest was mixed with cell culture media to obtain 1 x transduction media 
at a tonicity of 500 mOsmol/kg and added to the cells. Cells were incubated 
for 12 hr, after which transduction media were removed and exchanged 
for regular culture media. In the second protocol, protein transduction is 
performed for 3 hr at an osmolality of 700 mOsmol/kg (protocol 3/700). 
Cells were plated as above. The next day, 1 x transduction media with 
the protein of interest were added as above, and final osmolality was 
adjusted to 700 mOsmol/kg using NaCI. Cells were incubated for 3 hr, after 
which transduction media were removed and exchanged for regular culture 
media. 

Beta-Lactamase Transduction and Quantification of Beta- 
Lactamase Incorporation 

Beta-lactamase transduction in MEFs and mES cells was performed using 
the 3/700 and 12/500 protocol, respectively. After protein transduction, 
beta-lactamase activity was measured using the CCF2-AM loading kit 
(Life Technologies, K1032) following the manufacturer’s instructions. Relative 
beta-lactamase activity was calculated using the following formula: relative 
beta-lactamase activity (%) = (X-B)/(A-B)*100, where X represents the 
beta-lactamase value of the sample; A is the beta-lactamase value of a 
reference sample, and B is the beta-lactamase value of cells transduced 
in isotonic media. For more details see Supplemental Information. 

CRISPR-Cas9 Transduction 

KBM7 cells were transduced with Cas9 protein and sgRNA with transduction 
media at 1 ,250 mOsmol/kg during 60 min in a 96-well format. In brief, 120,000 
KBM7 cells were seeded per well using KBM7 media (IMDM supplemented 
with 10% FBS, non-essential amino acids, glutamax, 2-mercaptoethanol). 
The next day, 3 hr before transduction, cells were incubated 250 ng/ml of the 
interferon inhibitor B18R (eBiosciences). Then, cell culture media was removed 
and complete transduction mixture (1 0 |al of Cas9 in 5 x transduction buffer, 30 |al 
of CRISPR/Cas9 transduction media, and 1 0 ^il of sgRNA solution) was added to 
cells. sgRNA and DNA plasmid sequences are shown in Data SI , Data S2, and 
Data S3. Cells were incubated during 60 min at 37°C, and the transduction 
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DPH7 sgRNA #6; 12 mutant sequences out of 12 sequences = 100% 

AGGGCAGCTCCACCTCCTGATGGTGAATGAGACGAGGCCCA GGCTGCAGAAAGTGGCCTCATGG CAGGCACATCAATTC WT 

AGGGCAGCTCCACCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAG TGGCAGGCACATCAATTC D8 [ 3x] 
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AGGGCAGCTCCACCTCCTGATGGTGAATGAGACGAGGCCCAGGCTG CATCAATTC D24 [2x] 
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AGGGCAGCTACACGTCCTG ATGGCAGGCACATCAATTC D4 1 
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DPH7 sgRNA #2; 15 mutant sequences out of 15 sequences = 100% 

ATGATGGGCTGTTTCGCCCTGCAAAC GGTGGACACCGAGCTGACCGCGGA CTCGGTGGAGTGGTGCCCGCTGCAAGGCT WT 
ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACCGAGCT-ACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCT D1 
ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACCGAGCTG-CCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCT D1 
ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACCGAGCTGA-CGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCT D1 

ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACCGAGC GGACTCGGTGGAGTGGTGCCCGCTGCAAGGCT D7 

ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACCGAGCTGAC TCGGTGGAGTGGTGCCCGCTGCAAGGCT D7 [ 4x] 

ATGATGGGCTGTTTCGCCCTGCAAACGGTGGA ACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCT DIO 

ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACC GACTCGGTGGAGTGGTGCCCGCTGCAAGGCT D12 

ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACCGAGCTGAC«^^TCCl!S“-iTGGAGTGGTGCCCGCTGCAAGGCT D12/+3 
ATGATGGGCTGTTTCGCCCTGCAAACGGiaSI «BMa tBWWamMg»WMilACTCGGTGGAGTGGTGCCCGCTGCAAGGCT D2 1 
ATGATGGGCTGTTTa^t r ?gau; m <3^i^^Siafe<K8<afefe!iaiB^CGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCT D3 2 

ATGATGGGCTGTTTCGCCCTGCAAACGGTG ACACCGAGCTG^CCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGC +2 
ATGATGGGCTGTTTCGCCCTGCAAACGGTGGACACCGAGCTGAACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGC +1 



DPH7 sgRNA #6; 18 mutant sequences out of 18 sequences = 100% 

CCTCCTGATGGTGAATGAGACGAGGCCCA GGCTGCAGAAAGTGGCCTCATGG CAGGCACATCAATTCGAGGCCTGGATT WT 

CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGC AGGCACATCAATTCGAGGCCTGGATT D8 [ 7x] 

CCTCCTGATGGTGAATGAGACGAGGCCCA CTCATGGCAGGCACATCAATTCGAGGCCTGGATT D16 

CCTCCTGATGGTGAATGAGACGAGGC CCTCATGGCAGGCACATCAATTCGAGGCCTGGATT D18 

CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGC CACATCAATTCGAGGCCTGGATT D21 

CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGT ^GGCCTGGATT D27 

CCTCCTGATGGTGAATGAGACGAGGCC AGGCCTGGATT D4 1 

CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGCCCTCATGGCAGGCACATCAATTCGAGGCCTGGAT +1 
CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGCCTTCATGGCAGGCACATCAATTCGAGGCCTGGAT +1 
CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGCCCTCATGGCAGGCACATCAATTCGAGGCCTGGAT +1 
CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTG— CAGGCACA -TCATGGCAGGCACATCAATTCGAGG D3/+8 
CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGC - - AGGCACATCTGCAGAAAGTGGCAGAAAG r - TG D4/+28 
CCTCCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGC AGGCACATCAATT GCAGGCACATCAATT D 6 / + 1 3 
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DPH7 Knockout clones - sgRNA #2 



DPH7 Knockout clones - sgRNA #5 



TTTCGCCCTGCAAAC GGTGGACACCGAGCTGACCG ACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA WT 

Clone 

. lAl-TTTCGCCCTGCAAACGGTGGACACCGAGC GGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D7 

' |a2-TTTCGCCCTGCAAACGGTGGACACCG CGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D9 



TCCTCAGGTGTCACATCCCGGTGGCTGGACATG TCTTGGGCTTGGCAGATGCCAGTGGATCCATACAACTGC WT 

Clone 



lAl-TTTCGCCCTGCAAACGGTGGACACCGAGCTGAACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGC +1 
^202 |a2-TTTCGCCCTGCAAACGGTGGACACCGAGCTGJ^CGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA +75 
AACTAGCCCTGAAAATGGATGGCGCTGGAGCGTCGGGCCCATACCCGGCCGTCGCCGGCAGTCGAGAGTGGACGG 



#502 I 



Al-TCCTCAGGTGTCACATCCCGGTGGCTGGACATGCCCTCT-GGGCTTGGCAGATGCCAGTGGATCCATACAACTGC 

A2-TCCTCAGGTGTCACATCCCGGTGGCTGGACATGCCCTCTTGGGCTTGGCAGATGCCAGTGGATCCATACAACTGC 

CTGGCATGCACCTGTAATTACAGCTACTGTCTGTGCATCTAACCATTTTGTCAATCCAC 



D1 

+59 



lAl-TTTCGCCCTGCAAACGGTGGACAC— 

|a2-TTTCGCCCTGCAAACGGTGGACACCG- 



— GGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D12 
- CGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D 9 



#503 I 



Al-TCCTCAGGTGTCACATCCCGGTGGCTGGACATGCCCT TGGCAGATGCCAGTGGATCCATACAACTGC D8 

A2 -CTAAGAGCTC / / GGGCTTGGCAGATGCCAGTGGATCCATACAACTGC D606 



DPH7 Knockout clones - sgRNA #3 

CGGACTCGGTGGAGT GGTGCCCGCTGCAAGGCTGC CACCTGCTGGCGTGCGGGACCTACCAGCTGCGGCGGC WT 

Clone 

. I Al-CGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCC-CAGGCACCTGCTGGCGTGCGGGACCTACCAGCTGCGGCGGC D2/+1 

"dUI I A2-CGGACTCGGTGGAGTGGTGCCCGCTGCA GGCACCTGCTGGCGTGCGGGACCTACCAGCTGCGGCGGC D8 

„„„„ I Al-CGGACTCGGTGGAGTGGTGCCCGCTGCA AGGCACCTGCTGGCGTGCGGGACCTACCAGCTGCGGCGGC D7 

I a2-CGGACTCGGTGGAGTGGTGCCCGCTGCAAGGC GTGCGGGACCTACCAGCTGCGGCGGC D17 

#303 I ^1‘CGGACTCGGTGGAGTGGTGCCCGCTGCAAGGC ACCTGCTGGCGTGCGGGACCTACCAGCTGCGGCGGC D7 

I A2-CGGACTCGGTGGAGTGGTGCCCGCTGCA GGCACCTGCTGGCGTGCGGGACCTACCAGCTGCGGCGGC D8 



DPH7 Knockout clones - sgRNA #6 

CCTGATGGTGAATGAGACGAGGCCCA GGCTGCAGAAAGTGGCCTCATGG CAGGCACATCAATTCGAGGCCTGGAT WT 

Clone 

I Al-CCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGC AGGCACATCAATTCGAGGCCTGGAT D8 

I A2-CCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGC CAGGCACATCAATTCGAGGCCTGGAT D7 

#602 I *1"CCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGCA GCACATCAATTCGAGGCCTGGAT D10/+1 

^ I A2-CCTGATGGTGAATGAGACGAGGCC CAGGCACATCAATTCGAGGCCTGGAT D25 

#603 I AI-CCTGATGGTGAATGAGACGAGGCCCAGGCTGCAG CAGGCACATCAATTCGAGGCCTGGAT D15 

I A2-CCTGATGGTGAATGAGACGAGGCCCAGGCTGCAGAAAGTGGCC^TCATGGCAGGCACATCAATTCGAGGCCTGG +2 



H1 hESCs 



DPH7 Knockout clones - sgRNA #2 



Clone 

#»C: 



:tgcaaac ggtggacaccgagctgaccg 



:tgcaaacggtggac 

:tgcaaacggtggacaccga- 



_ACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA WT 

- - - TCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D 1 8 
GCTGCAAGGCTGCAGGCA D30 



TTTCGCCCTGCAAAC GGTGGACACCGAGCTGACCG : ■ vA CTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA WT 

Clone 



#05 I 



A1-TTTCGCCCTGCAAACGGTGGACACCGAGCTGACTCGGTGGAGTCAGCTC//CACCGAGCTGACTCGCCCGCTGCA +140 
A2 -TTTCGCCCTGCAAACGGTGGACACCGAGCTGACACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGG +2 



„ I Al-TTTCGCCCTGCAAACGGTGGACACC GACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D5 

^ 02 I a2-TTTCGCCCTGCAAACGGTGGACACCGAGCTG^CCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGC +1 

w I ai-tttcgccctgcaaacggtggacaccgagctg-ccgcggactcggtggagtggtgcccgctgcaaggctgcaggca D1 

^ 04 1^2-tTTCGCCCTGCAAACGGTGGACACCGAGCTGAACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGC +1 

w I Al-TTTCGCCCTGCAAACGGTGGACACCGAGCTGAACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGC +1 
^ I A2-TTTCGCCCTGCAAACGGTGGACACCG CGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D9/+51 



w I Al-TTTCGCCCTGCAAACGGTGGACACCGAG TCAGGTGACTCGCCCTGCAAACGGT GGACTCGGTGGAGTGGTGCCCG D8/+25 

" I A2-TTTCGCCCTGCAAACGGTGGACACCGAG- — ^GTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D15 

„ I Al-TTTCGCCCTGCAAACGGTGGACACCGAGCTGATCCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGC +1 
"Or I A2-TTTCGCCCTGCAAACGGTGGACACCGAGCTGAACCGCGGACTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGC +1 

„ I A1-TTTCGCCCTGCAAACGGTGGACACCGAGCTGAC+— "t TCGGTG MGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D12/+6 

" 08 I a2-TTTCGCCCTGCAAACGGTGGACACCGAGCTGAC+ •MTCGGTGGAGTGGTGCCCGCTGCAAGGCTGCAGGCA D7 



(legend on next page) 
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media was carefully replaced by standard culture media supplemented with 
250 ng/ml of B1 8R. Cells were incubated for 48 hr and analyzed by flow cytom- 
etry. Knockout of endogenous DPH7 gene was determined by adding LFn-DT 
(Carette et al., 2009) and DNA sequencing of the surviving cells. Two rounds 
of Cas9/sgRN A transduction were performed as above with a recovery time be- 
tween transductions of 5-7 days. See T ables S3 and S4 for primer sequences for 
genomic DNA amplification and surveyor assay. 

H1 human ESCs were transduced as above with slight modifications. 
Cells were passaged by mechanical dissociation into small clumps following 
mTeSRI manufacturer’s instructions and seeded on a matrigel-coated plate. 
Cells were transduced 2-3 days after seeding when they reached a confluency 
of 80%-90%. Two rounds of Cas9/sgRNA transduction were performed the 
same as above with a recovery time between transductions of 5-7 days. Cells 
were not passaged between two transductions. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, 
six figures, five tables, and three data files and can be found with this article 
online at http://dx.doi.Org/10.1016/j.cell.2015.03.028. 
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Figure 7. Endogenous Gene Disruption Induced by CRISPR-Cas9 Transduction 

(A) Top: schematic depiction of the DPH7 gene and target sites of six different sgRNAs used in the experiments. Bottom: schematic depiction of the Cas9-sgRNA 
transduction and diphtheria toxin selection procedure. KBM7 cells were transduced twice with Cas9 and DPH7 sgRNAs with a 7 day interval between trans- 
ductions. Controls were untreated cells and cells transduced with Cas9 together an /\AVS1 sgRNA (CrtI). 7 days after the second transduction, cells were treated 
with LFn-DTA. Bar graph shows the number of viable cells after 2 days of diphtheria toxin selection. 

(B) Analysis of target-site mutations in the endogenous DPH7 gene in diphtheria-toxin-resistant KBM7 cells after transduction of recombinant Cas9 protein and 
in-vitro-transcribed sgRNA. The wild-type (WT) sequence is shown at the top. Start codon is indicated with underlined ATG. Deletions are indicated by dashes 
and yellow background and insertions with underlined green text. The sizes of the insertions (+) or deletions (D) are indicated to the right of each mutated site. 
Numbers in brackets show the amount of sequences obtained. In the wild-type sequences are indicated the sgRNA binding site and PAM sequence in underlined 
red and blue text, respectively. The primers used to amplify the different DPH7 genomic regions are listed in Table S3. 

(C) Analysis of target-site mutations at endogenous DPH7 gene in diphtheria toxin-resistant HI hESCs after transduction of recombinant Cas9 protein and 
in-vitro-transcribed DPH7 sgRNAs. Annotation as in (B). 

(D) Schematic representation of the experimental design used for the quantification of biallelic DPH7 gene knockout by Cas9-sgRNA transduction. KBM7 or 
hESCs were transduced once or twice (as indicated) with Cas9 together with DPH7 sgRNAs. Control was cells transduced with Cas9 and an /\AVS1 sgRNA. After 
3 days, single cells were sorted into 384-well plates using a flow cytometer. 7 days later, the number of expanding clones was counted, and cells were treated with 
diphtheria toxin. After 2 days of diphtheria toxin treatment, surviving clones were counted. DPH7 knockout efficiency was calculated as the percentage of total 
single-cell clones that were diphtheria toxin resistant of total clones obtained. 

(E) Biallelic DNA sequence analysis of diphtheria toxin-resistant clones in KBM7 cells and HI hESCs. Annotation as in (B). Al , allele 1 ; A2, Alele2. The sizes of the 
insertions (+) or deletions (D) are indicated to the right of each mutated site. 
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(Cell 155 , 894-908; November 7, 2013) 

After the publication of this article, we noticed an error in the text describing the generation of cytosolic-trapped HDAC5 (GFP- 
HDAC5cyto) in both the Results and Extended Experimental Procedures sections. The original text mistakenly referred to mutation 
of serine residues 259 and 498 to aspartic acid, generating a mostly cytoplasmic HDAC5 (GFP-HDAC5tcyto) mutant. This sentence 
omitted that, in addition, serine residue 280 was mutated to alanine. Indeed, the sequential mutagenesis process of the three point 
mutations was first serine 280 to alanine, second serine 259 to aspartic acid, and third serine 498 to aspartic acid. Hence, the accu- 
rate description is that mutation of serine residues 259 and 498 to aspartic acid and serine residue 280 to alanine generated mostly 
cytoplasmic HDAC5 (GFP-HDAC5tcyto) mutant. This error in the description of the HDAC5cyto construct, which was used in Figures 
5E and S2A, does not affect the interpretation of the data, the results in the paper, or the overall conclusion of the study. We apologize 
for any confusion that this error may have caused. 
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Snapshot: Sensing and 
Signaling by Cilia 

Kurt Zimmerman and Bradley K. Yoder 
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Primary cilia are cellular appendages that coordinate diverse sensory and signaling activities. They are important for proper mammalian development, adult tissue homeosta- 
sis, and vision and odorant detection, and their dysfunction contributes to disease pathology and developmental defects. 

Signaling 

The most extensively defined cilia-associated signaling pathway is hedgehog (Hh). Hh ligands bind to the receptor patched (Ptc) located in the cilium. Ptc then exits the cilium, 
relieving inhibition of Smoothened (Smo). Smo accumulates in the cilium and promotes activation and nuclear translocation of the GN2 transcription factor and prevents formation 
of the GN3 repressor. Many of the developmental defects observed in cilia mutants such as polydactyly and neural tube mis-patterning are caused by dysfunctional Hh signaling. 
Hh signaling also regulates autophagy by acting on autophagy-related proteins at the cilium base. 

In the absence of canonical Wnt, p-catenin is targeted for destruction by the GSK3p/Axin2/APC complex. Upon binding of Wnt to the Frizzled (Fz) receptor. Dishevelled (DvI) 
is recruited to Fz, resulting in the destruction of the p-catenin degradation complex. Stabilized p-catenin translocates to the nucleus activating Wnt target genes. The noncanoni- 
cal Wnt pathway is independent of p-catenin and functions through calmodulin kinase (CamK) and JNK to cause cytoskeletal rearrangements that regulate cell morphology and 
orientation in a planar field, referred to as planar cell polarity (PGP). Cilia mutant mice have PGP defects in the inner ear, and knockdown of ciliary proteins in zebrafish results 
in PGP phenotypes, including defective convergent extension movements and tail malformations. Furthermore, the ciliary localized protein, Inversin, binds to and blocks Dvl- 
mediated activation of the canonical Wnt pathway and functions as a switch between canonical and non-canonical Wnt pathways. 

Platelet-derived growth factor receptor a (PDGFRa) accumulates in the cilium upon growth arrest. Following PDGF-AA binding, PDGFRa activates the Mek1/2-Erk1/2 pathway 
or the PI3K/AKT pathway, promoting directional cell migration. Cells lacking cilia exhibit non-directed migration. 

Upon binding of the Notch ligand to its receptors, the Notch intracellular domain (NICD) is cleaved and translocates to the nucleus, where it associates with the DNA-binding 
protein RBP-j. Notch receptors and processing enzymes co-localize in cilia of skin epidermal cells. Activation of the Notch pathway induces skin barrier formation that fails to 
occur in the absence of cilia. 

Both PKD patients and mouse models of ciliary dysfunction exhibit increased mTOR activity. mTOR activity is regulated through PC1 function, either directly upstream through 
MEK/ERK or via a TSC-dependent pathway. Bending of the cilium decreases mTOR activity in an LKB1 -dependent manner. 

The hormone leptin signals through its receptor (ObRb) to regulate Jak/Stat activity and food intake. Bardet-BiedI syndrome (BBS) patients and mice exhibit obesity. BBS1 
can bind ObRb and may mediate its localization near the base of cilia; however, ObRb has yet to be observed in the cilia. Mice lacking cilia in the leptin-responsive POMC hypo- 
thalamic neurons also develop obesity; however, leptin signaling is normal in these mice. Thus, the connection between leptin and cilia remains controversial. 

The insulin growth factor 1 (IGF-1) receptor is localized to the primary cilium in 3T3-L1 preadipocytes and has increased sensitivity to insulin stimulation compared to non- 
ciliary IGF-1 receptors. AKT and IRS-1 are recruited to the basal body during cilium formation and are phosphorylated by the receptor kinase located in the cilium. Loss of IFT88 
or Kif3a, which disrupts ciliogenesis, prevents IGF-1 -receptor-mediated adipocyte differentiation. 

Mechanosensation 

In epithelial cells, bending the cilium by fluid flow increases cytoplasmic calcium through polycystin 1 (PCI) and polycystin 2 (PC2), which form a channel complex. Mutations 
in PCI or PC2 impair calcium signaling and result in polycystic kidney disease (PKD). The polycystin-related proteins PKD1L1 and PKD2L1 also act as ciliary localized calcium 
channels. 

Cilia function as a mechanosensor on the embryonic node, an important structure formed during gastrulation that specifies left-right body axis. Motile cilia on the node 
generate fluid movement that is sensed by neighboring non-motile cilia. Deflection of the non-motile cilia induces a left-sided calcium signal that is dependent on PC2, thereby 
explaining the left-right axis defects in PC2 mutant mice. 

Cilia mechanosensation is also important for STAT signaling. STAT6 binds to the C-terminal tail (CTT) of PCI . PCI CTT is proteolytically cleaved in the absence of flow, allowing 
STAT6 and the CTT to enter the nucleus and interact with the co-activator PI 00. Mutations preventing CTT cleavage cause PKD. PCI also regulates STAT1 and STAT3, although 
it may not be dependent on mechanosensation. 

Vision and Smeii 

Light activation of the G-protein-coupled receptor (GPCR) rhodopsin (Rho), located in the outer segment of rods (highly modified form of cilia), initiates the visual transduction 
pathway through the G protein transducin. Transducin induces phosphodiesterase, causing cGMP to be converted to GMP. The drop in cGMP leads to closure of sodium chan- 
nels and hyperpolarization of the rod inhibiting glutamate release. In the olfactory system, each olfactory neuron expresses a single type of olfactory receptor (OR) located in the 
cilium. ORs are GPCRs that bind a specific odorant and activate adenylyl cyclase (AC) through the G protein associated with the OR. The AC-mediated increase in cAMP opens 
cyclic nucleotide-gated ion channels in the cilia membrane, resulting in an influx of sodium and calcium and efflux of chloride, depolarizing the neuron. 

Multiple other GPCRs localize to cilia, including somatostatin receptor 3 (Sstr3), melanin-concentrating hormone receptor 1 (Mchrl), serotonin subtype 6 receptor (5-HTg), 
dopamine receptor 1, and G-protein-coupled receptor 161 (Gpr161), but the role of cilia in their regulation is not currently known. 
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